Describing the layout of a document

15 April 2004, 10:55 UTC

I'm still thinking about document formatting, which is a bit of a diversion from the ultimate editor. But I'll get back there.

The language needs to be able to describe what a page looks like, what parts it has, and how content is placed in those parts. It also needs to indicate how overflow content spills onto subsequent pages.

Some of the ideas here are inspired by the Lout typesetting system - a system that I thought held a lot of promise that for some reason it never quite realised (maybe this system will follow in it's footsteps :-( )

First, some definitions.

A "document" is a stream of content (words, paragraphs, table, images, figures, stuff like that) which needs to be "filled" into some "space". When the space is filled with a document, you have a publication (though I probably need a better word there).

This entry is all about describing "space" and some general information on how it is "filled".

The first observation to make is that processing a markup often produces multiple documents. Not only is there the main, or "body" document, but their might be a foot-note document, an end-note document, a table of contents, an index, and other things. An interesting possible document is the "running-head", which may not immediately look like a document, but is usefully treated as one.

The second observation to make is that it helps to think of the filling process as happening one page at a time. We start with some number of documents, and a page with space in it. The process of filling turns the page of space into a page of publication, consumes some documents and produces some documents, which may be the unpublished tail of some of the original documents.

Finally, publications are made of sections, not just pages. We move from one page to the next when the first page is full. We move from one section to the next when the content of the one section is exhausted. Both of these triggers need to be recognised.

The processes of filling a document can generate content that is appended to other documents. For example filling a foot-note reference should append the footnote content to the footnote document. Filling a section heading could append to the table-of-contents and the the running-head document.

It would be nice if we could simply take one document, fill it in the page until there was no more space, then move onto the next document, continuing until all documents had been filled as much as possible. However this is not sufficient for common usage.

For example, as footnotes tend to steal space from the body document, filling the footnote document after the body would mean there was never enough space for footnotes until the end of the section (making them end-notes) and filling the footnote document before the body document would mean that footnotes always appear on the page after they are referenced.

Instead we need a multipass process. Each pass fills what it can from each document in order and appends to other documents as appropriate. If at the end of a pass, the documents are the same as at the begining, we are done. If not, we re-run the pass with all documents pre-filled with the the previous pass appended to them. Obviously when filling causes content to be added to a document, it is not added a second time.

This process could oscilate indefinately. If a page has a footnote near the end, then filling that footnote on that page could push the reference on to the next page. If we then move the footnote off that page, it must be brought back. Clearly we would want this oscilation to terminate with the footnote on the page after the reference. This is probably best done by making a special case of documents that, when filled, take space from a documet which, when filled extent the first document. These special case documents are not permitted to grow after an oscilation has been detected.

A note of running heads is needed at this point. A running head involves including the title of a near-by section in the header (or footer) of a page. If there is any section heading on this page, the first such should go in the running head. If not, the last such from the previous page, or the most recent previous page with a section heading, should be used.

To achieve this:

So, each page must list what document(s) must be present and non-empty to justify using the page, and must describe (boiler-plate) content and empty spaces on the page. Each empty space identifies what document should be filled there and may specify context for the filling. The empty spaces also have an ordering (Which is filled first) and a no-grow flag to indicate that the document filled here cannot grow after an oscilation is detected.

Pages are grouped into lists. When a page is filled, we need to determine which page to fill next:

With these rules, something like:

[ titlepage copyrightpage [ chapterstart evenpage [oddpage evenpage]]]

Would allow a simple book layout. chapterstart would require a 'book' document and filling the chapter heading would move the chapter content into a 'chapter' document. evenpage wouldn't require any document, while oddpage would require a 'chapter' document. Both would fill with the 'chapter' document. [copyrightpage] would not require any document but would display "This page intentionally left blank" if no 'copyright' document were present.

Spaces in pages are primarily rectangles, but we cannot expect them always to be so. A good example is two-column text with some sort of insert in the centre of the page which overlaps and therefore shrinks the two columns at that point.

A simple approach could divide the two columns into three sections each, a full width, a part-width, and a full-width. However if the line-heights do not divide evenly into the first rectangle, there could be an unsightly gap between it and the next rectangle.

There may also be a desire for spaces with sloping or even non-linear boundaries. If the insert mentioned above were circular, it might be nice to fill the text close to that circle.

As a first approximation, a space shall be a rectangle, with lengths describe from style parameters and the sizes of spaces that are earlier in the filling list. Sequential rectangles for the same document can be declared to share an edge which effectively creates a union of rectangles. From such a union, an already-filled space can be subtracted. If this space has non-linear boundaries, that creates non-linear boundaries for the space.

Spaces are normally filled in a well-defined order from a well-defined point onwards (typically left to right and top to bottom). A space's final position is determined after it has been filled and so this position can be a function of the final size of the space. The apparent size will not included any area in the space that filling did not end up placing content.

So, a page is described as an ordered list of rectangles, each with a name, with dimensions, with location and with content.

The dimensions may be a function of style parameters and of the dimensions and positions of earlier rectangles.

The location may be a function of style parameters, locations of earlier rectangles, and the size of this or earlier rectangles.

The content may be literal and may include a reference to one or more documents.

A rectangle may be conditional on the presence or non-presence of a particular document.

When a space is filled, the remainder of any documents can be explicitly moved to some other document, or discarded. By default they remain where they are.




[æ]