Re: Architectural Forms, separation of formatting and loose-leaf management

Rick Jelliffe (ricko@allette.com.au)
Thu, 7 May 1998 00:53:31 +1000


This is a multi-part message in MIME format.

------=_NextPart_000_0029_01BD7952.8E0B8240
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Here are three random things which may be useful to consider.

1) The first is that DSSSL allows you to have external functions. So =
even though DSSSL itself has no way to query the pagination system, =
DSSSL does allow you to stick in your own queries or functions. You can =
do all sorts of tricks with these. I dont know to what extent JADE =
supports this, though. One trouble with stream-based SGML processors is =
that they often have an output buffer (or are in a pipe) so unless you =
can flush the output buffers, your SGML processor may be left stranded =
if it waits for some feedback from a downstream program.

A DSSSL system built on top of a general purpose Scheme would be most =
likely to cope with feedback from layout engines. Tony Graham of the =
DSSSL list would be a good contact in this regard.

2) People often put pagination information in processing instructions. =
Or the information can be kept in an external database with, for =
example, HyTime locators. If you can decide in advance to only break =
pages on paragraph boundaries, then you can piggyback the pagination =
information on top of element markup.

3) If you find you have many of these concurrent structures, you may opt =
for "point markup", which is rather extreme, and would be an interesting =
challenge for some stream-based processors. In point markup, your main =
text is just marked up using=20
<!DOCTYPE document [
<!ELEMENT text ( #PCDATA | point)*>
<!ELEMENT point EMPTY>
<!ATTLIST point id ID #REQUIRED >

Then you have as separate element trees for each kind of structure: =
these trees probably contain no character data of their own, just IDREFs =
to the start and end of their range. In this way you can represent =
concurrent, overlapping hierarchies in SGML. For example:

<!ELEMENT document (tree+, text)>
<!ELEMENT tree (start, tree*, end)>
<!ELEMENT ( start | end ) EMPTY >
<!ATTLIST tree type NMTOKEN #IMPLIED >
<!ATTLIST (start | end ) refid IDREF #REQUIRED >
]>
<document>
<tree name=3D"pages">
<start refid=3D"x1"/> =20
<tree name=3D"page1">
<start refid=3D"x1"/>
<end refid=3D"x4"/>
</tree>
<tree name=3D"page2">
<start refid=3D"x4"/>
<end refid=3D"x5"/>
</tree>
<end refid=3D"x5"/>
</tree>
<tree name=3D"p">
<start refid=3D"x2"/>
<tree name=3D"b">
<start refid=3D"x3"/>
<end refid=3D"x5"/>
</tree>
<end refid=3D"x5"/>
</tree>
<text><point id=3D"x1"/>here is <point id=3D"x2"/>some<point =
id=3D"x3"/>
data <point id=3D"x4">of no interest.<point =
id=3D"x5"/></text>
</document>

This structure has the advantage of neatness, and provides a lot of =
modeling power
for just one extra level of indirection. If you used HREF rather than =
REFID, you can use
external point markup too.

The effect, of course, is to have concurrently
<pages><page1>here is some
data </page1><page2>of no interest.</page2></pages>
and
<p>here is <b>some</b>
data of no interest.</p>

Rick Jelliffe

Author, "The XML & SGML Cookbook", out in May from Prentice Hall.

------=_NextPart_000_0029_01BD7952.8E0B8240
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">

Here are three random things which = may be useful=20 to consider.
 
1) The first is that DSSSL allows you to have = external=20 functions. So even though DSSSL itself has no way to query the = pagination=20 system, DSSSL does allow you to stick in your own queries or functions. = You can=20 do all sorts of tricks with these. I dont know to what extent JADE = supports=20 this, though. One trouble with stream-based SGML processors is that they = often=20 have an output buffer (or are in a pipe) so unless you can flush the = output=20 buffers, your SGML processor may be left stranded if it waits for some = feedback=20 from a downstream program.
 
A DSSSL system built on top of a = general purpose=20 Scheme would be most likely to cope with feedback from layout = engines. =20 Tony Graham of the DSSSL list would be a good contact in this=20 regard.
 
2) People often put pagination information in = processing=20 instructions.  Or the information can be kept in an external = database with,=20 for example, HyTime locators. If you can decide in advance to only break = pages=20 on paragraph boundaries, then you can piggyback the pagination = information on=20 top of element markup.
 
3) If you find you have many of these concurrent = structures,=20 you may opt for "point markup", which is rather extreme, and = would be=20 an interesting challenge for some stream-based processors. In point = markup, your=20 main text is just marked up using 
<!DOCTYPE = document=20 [
    <!ELEMENT text ( #PCDATA |=20 point)*>
    <!ELEMENT point = EMPTY>
    <!ATTLIST point  id ID = #REQUIRED=20 >
 
Then you have as separate element = trees for each=20 kind of structure: these trees probably contain no character data of = their own,=20 just IDREFs to the start and end of their range.  In this way you = can=20 represent concurrent, overlapping hierarchies in SGML. For example:
 
    <!ELEMENT document (tree+, = text)>
    <!ELEMENT tree     = (start,=20 tree*, end)>
    = <!ELEMENT ( start |=20 end ) EMPTY >
    <!ATTLIST   =20 tree            = type=20 NMTOKEN #IMPLIED >
    <!ATTLIST (start | end )   = refid IDREF=20 #REQUIRED >
]>
   =20 <document>
       =20 <tree name=3D"pages">
          &nbs= p;=20 <start = refid=3D"x1"/>       =20
          &nbs= p;    =20 <tree name=3D"page1">
          &nbs= p;            = ;=20 <start refid=3D"x1"/>
          &nbs= p;            = ;=20 <end refid=3D"x4"/>
          &nbs= p;    =20 </tree>
          &nbs= p;    =20 <tree name=3D"page2">
          &nbs= p;            = ;=20 <start refid=3D"x4"/>
          &nbs= p;            = ;=20 <end refid=3D"x5"/>
          &nbs= p;    =20 </tree>
            <end=20 refid=3D"x5"/>
       =20 </tree>
       =20 <tree name=3D"p">
          &nbs= p;=20 <start refid=3D"x2"/>
          &nbs= p;    =20 <tree name=3D"b">
          &nbs= p;            = ;=20 <start refid=3D"x3"/>
          &nbs= p;            = ;=20 <end refid=3D"x5"/>
          &nbs= p;    =20 </tree>
          &nbs= p;    =20 <end refid=3D"x5"/>
       =20 </tree>
       =20 <text><point id=3D"x1"/>here is <point=20 id=3D"x2"/>some<point = id=3D"x3"/>
          &nbs= p;    =20 data <point id=3D"x4">of no interest.<point=20 id=3D"x5"/></text>
   =20 </document>
 
This structure has the advantage of neatness, and = provides a=20 lot of modeling power
for just one extra level of indirection. If you used = HREF=20 rather than REFID, you can use
external point markup too.
 
The effect, of course, is to = have=20 concurrently
    <pages><page1>here is = some
          &nbs= p;    =20 data </page1><page2>of no=20 interest.</page2></pages>
and
    <p>here is=20 <b>some</b>
          &nbs= p;    =20 data of no interest.</p>
 
 
Rick Jelliffe
 
Author, "The XML & SGML Cookbook", out = in May=20 from Prentice Hall.
------=_NextPart_000_0029_01BD7952.8E0B8240-- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)