Relaxing Knuth’s Restrictions: Break-Dependent Widths in Optimal Line Breaking

In 1981, Knuth and Plass published “Breaking Paragraphs into Lines,” the algorithm behind TeX’s paragraph layout. It models a paragraph as a sequence of boxes, glue, and penalties, then uses dynamic programming to choose the globally optimal set of line breaks. More than forty years later, it remains the canonical algorithm for optimal line breaking, and its global view of the whole paragraph still gives TeX a beauty that most modern layout engines never quite match.

Yet for all its elegance, the algorithm does not cover every case. Knuth and Plass imposed two restrictions that make the bookkeeping and pruning efficient. While these restrictions suit ordinary text well, they do not suit every domain in which we might want optimal line breaking.

We ran into these restrictions while building line breaking for music notation with lyrics. There, a break can change geometry on both sides of the breakpoint: if two notes stay on the same line, their lyric syllables may tuck under each other and consume less space, while a break between them forces each side to take its full width. The layout engine therefore has to reason about geometry that depends on exactly the breakpoint it is trying to choose.

We found that we could still handle this within the traditional box, glue, and penalty algebra, but only by relaxing both of Knuth’s restrictions together. The result stays optimal and, just as importantly, preserves pruning, so it stays efficient. Knuth expected this sort of extension:

I figure the next generation of systems will have a lot more complicated mechanisms to handle the general cases of everything, where I’ve considered only the cases that I thought were the 99%.
– Donald Knuth, TUGboat, 1996

The restrictions

Knuth and Plass observed that negative widths are possible in principle, but awkward in practice. To keep the algorithm efficient, they imposed two restrictions.

Restriction 1: The minimum possible line length up to successive breakpoints never decreases. This is the assumption behind TeX’s most important pruning step. If a candidate line is already overfull even after using all available shrink, then every later breakpoint will also be overfull, so the algorithm can safely prune that active node. In other words, once a line becomes overfull, it remains overfull. This logic breaks down as soon as a later negative width can pull the line back into range, because then a node that looks hopeless now may still be rescued later.
Restriction 2: Every pair of consecutive legal breakpoints must be separated by a box. The original algorithm tracks running width, stretch, and shrink totals as it scans forward. Simple subtraction gives the metrics for a line segment, but only if there is at least one box between consecutive breakpoints. If two breakpoints are separated only by glue and penalties, the subtraction includes glue that should have been pruned at the start of the new line, producing incorrect results.

For ordinary text, both restrictions are natural, because words are positive-width boxes separated by positive-width spaces. As soon as we move beyond text into domains like music notation or CSS layout, neither restriction is guaranteed to hold.

When a break changes width

The simplest example is a ligature. Consider the word “efficient.” When no break occurs, “ffi” can form a ligature, which is narrower than the split letters. If we break inside it and write “ef-ficient,” we pay for the hyphen and also lose the narrow joined form, so the width near the breakpoint depends on whether we break there.

TeX handles cases like this with discretionary nodes, a special-case mechanism that sits outside the basic box, glue, and penalty algebra. A discretionary holds pre-break, post-break, and no-break replacement content, and TeX’s line breaker has dedicated machinery to evaluate each alternative without introducing negative widths into the running totals. This works well for text, where discretionaries are occasional and self-contained.

Discretionary content in TeX is restricted to fixed-width material, such as simple ligatures. It cannot contain glue. More complicated use cases, like music notation and CSS layout, can produce encodings that rely on break-dependent glue with stretch and shrink, which falls outside what discretionary nodes can express.

Encoding the difference with negative widths

We do not need a new primitive to represent break-dependent width. Suppose a discretionary ligature has pre-break width $p$ , post-break width $q$ , and no-break width $n$ . We charge $p$ and $q$ to the break path, and we add a cancellation term to the no-break path:

n - (p + q)

If the joined form is narrower than the split forms, this cancellation term is negative. For the “ffi” example in Source Serif 4 at 10pt, the joined ligature measures 9.11pt while “f-” followed by “fi” measures 6.66 + 6.07 = 12.73pt, so the cancellation term is 9.11 - 12.73 = -3.62pt.

The same encoding works beyond ligatures. It handles any case where we want the break path to pay one amount and the no-break path to pay another, even when the effect is split across the end of one line and the start of the next.

The difficulty is that negative widths violate Restriction 1. A later negative width can rescue an earlier overfull line, which means the algorithm can no longer safely prune a node once its line becomes overfull. If we keep the original rule, we can prune nodes that belong to the optimal solution. If we disable pruning, we recover correctness but fall back to quadratic time.

Relaxing Restriction 2 with prefix sums

The first step is to replace local running totals with global prefix sums. We precompute four arrays, one each for box width, glue width, glue stretch, and glue shrink, and we also precompute the next content-bearing item after each position. Then, instead of carrying local totals forward inside each active node, we compute the metrics for any candidate line by subtraction from the appropriate content start.

This shift buys us two things at once. It relaxes Restriction 2, because consecutive breakpoints no longer need an intervening box, and it puts every line metric into a single coördinate system. The prefix-sum subtraction starts at the first content-bearing item after the breakpoint, so it naturally excludes leading glue and penalties from the next line, and this common coördinate system is exactly what makes the next relaxation possible.

Relaxing Restriction 1 with a suffix-minimum floor

The original pruning question is too local. It asks whether the current line is overfull, but the real question is whether any later breakpoint could still rescue it.

To answer that, we define the floor at breakpoint $b$ :

F(b) = W(b) - S(b)

Here $W(b)$ is the prefix width up to breakpoint $b$ , and $S(b)$ is the prefix shrink up to $b$ . The floor is the smallest width we can reach at that breakpoint after spending all available shrink. Under Restriction 1, the floor can only increase as we move right, and this is exactly why TeX can prune when the adjustment ratio drops below -1.

With negative widths, the floor can dip, so a later breakpoint may have a smaller floor than an earlier one. The key insight is that we can precompute the suffix minimum of the floor over all later legal breakpoints. This gives us a direct test: for an active node $a$ and current breakpoint $b$ , we prune only if both statements are true:

The current line from $a$ to $b$ is overfull.
The smallest floor anywhere to the right of $b$ is still too large to fit a line starting at $a$ .

Because we express both the floor and the line-start position in the same prefix-sum coördinates, the second check becomes a single comparison. If that comparison passes, no future breakpoint can rescue node $a$ , so pruning is safe; if it fails, we must keep the node alive.

Here is the ligature example again, with a target line width of 350pt:

Breakpoint	Width expression	Floor $F(b)$	Suffix min after $b$	Decision
$b_1$ (before ligature)	353.62	353.62	350.00	Keep it. A later breakpoint can rescue it
$b_2$ (ligature split)	353.62 + 0.00 - 3.62	350.00	380.00	It fits
$b_3$ (after next word)	350.00 + 30.00	380.00	$\infty$	Prune it. No rescue remains

At $b_1$ the line looks overfull, but the suffix minimum tells us that a later breakpoint reaches 350.00pt, so we must not prune. At $b_2$ the cancellation term takes effect, and the line fits. At $b_3$ there is no later rescue, so pruning is sound. The extra work is small: one forward pass computes the breakpoint floors in $O(n)$ , one backward pass computes the suffix minimums in $O(n)$ , and the main search answers the rescue question in $O(1)$ per check. As long as stretch and shrink stay non-negative, negative widths therefore do not destroy pruning.

Without pruning, the Knuth-Plass algorithm is $O(n^2)$ . Pruning is what makes it fast: once a line is overfull, the node is pruned, keeping the active set small. Negative widths threaten this, because a later negative width can rescue an overfull line, keeping nodes alive that the original rule would have pruned. The suffix-minimum floor restores pruning: a node is pruned only when no future breakpoint can rescue it. In our music notation tests, the active set stayed small and the algorithm ran in linear time, at the cost of $O(n)$ preprocessing time and $O(n)$ extra space for the prefix sums and suffix minimums.

Scope and limitations

These are not two isolated changes. The suffix-minimum pruning rule works only because the prefix-sum construction puts every quantity into the same coördinate system: we compute the floor in prefix coördinates, we compute each node’s overfull cutoff in prefix coördinates, and we compare them directly. Without this alignment, the proof does not quite land, because we would be reasoning about one notion of where a line starts in the pruning argument and a different notion in the measurement code. Taken together, the two relaxations replace a patchwork of restrictions with one framework that handles break-dependent width cleanly from start to finish.

The argument does assume that shrink and stretch stay non-negative. Negative shrink breaks the floor: $F(b) = W(b) - S(b)$ no longer represents the smallest reachable width, so the suffix-minimum test is no longer sound. Negative stretch also breaks the pruning precondition: an adjustment ratio below -1 no longer implies that the line is overfull even at maximum shrink. In either case, we fall back to unpruned search. For our use case, this is an acceptable limitation, because negative widths are how we encode break-dependent geometry, while stretch and shrink still represent genuine spacing flexibility and in practice remain non-negative.

Future work

For us, pruning was the difference between an idea that looked good on paper and one that worked on real scores. In our music notation tests, the number of active nodes grew linearly, which made the method usable on long real-world passages.

More broadly, we can now represent break-dependent widths inside the standard Knuth-Plass algebra without giving up optimality or settling for exhaustive search. This matters anywhere a break changes the geometry of neighboring content. We have validated the method in music notation, and we suspect that it is relevant to CSS layout problems that reduce to break-dependent widths.

We have published the implementation as pull request #122 against tex-linebreak, which was merged and released in v0.9.0.