My recommendation would be to do a dumb translation of LaTeX into XML. By
doing so, you are deferring all the critical decisions which, if made
prematurely, could cause information loss and taint.
Once you have the XML-lized LaTeX document you have a core document to
create more application-oriented XML documents from. For example, if you
are interested in duplicating the layout of the original LaTeX document, you
could extract the layout information and create a PGML document. If you are
interested in an indexable XML document, you can extract the contents and
structural elements and massage them into an easily indexable format.
At later point, you can inject elements representing the author's intent as
well as some other content expert's interpretation (such element should have
an attribute indicating the point of view).
Regards,
Don Park
http://www.docuverse.com/personal/index.html