small update

Wed Sep 8 04:26:58 PDT 2004

And one more long rambling: :-)

Many export formats that are XML based or based on plain text IMO can be
created by a reasonably complete, proprietary, accessible XML output
format - even if it's only a intermediate step. Therefore, I think this
is where you should put your effort. Then, provide an easy way to apply
some custom XSLT on that OO format ("OOML") and have that setting be
saved as a named "Custom Export format". This way, people can easily
create new export formats, or modify existing ones (make the XSLT source
available for modification).

I have been in a similar situation with our product upCast, a Word to XML
transformer. Whichever features and output filters we offered, people
wanted a tweak here and there to match it their specific situation. So
then, we added CSS for exporting style info for people to do whatever
they wanted upon seeing certain style combinations. However, parsing CSS
is not easily done from within XSLT. Finally, we created a proprietary
XML export which is rather verbose, but also complete, and all style info
is available as XML attributes on the respective elements which can be
easily accessed with XSLT. This filter is based on the idea that it is
much easier to discard unwanted info than trying to create one that is
not there.

** Why a proprietary output format? **
OO's internal structure is unique. OPML is a _very_ limited format and
probably not suitable to express the richness of the internal OO data
model. Export all you know, so people can later decide what they need
from the complete info set. Remember that discarding is easy, but re-
creating non-existant info is impossible.

** Don't write embedded RTF **
For rich text contents, pelase do not store RTF code as the PCDATA. RTF
is extremely difficult to access using plain XSLT. Make it XML, with
style added e.g. as attributes named like the corresponding CSS attributes. So

<richtext>{\i italic} text and {\b bold}</richtext>

should better be exported as

<richtext><span font-style="italic">italic</span> text and <span font-
weight="bold">bold</span> text</richtext>

This allows easy translation to HTML, RTF, LaTeX and probably many more.

** Add meta-data **
Add all meta data you also keep for internal storage. Or in other words:
Make sure there exists an unambiguous mapping between the native OO file
format and the proprietary XML format. However, pay attention that the
XML format builds the logical structure of the document by element
nesting and not (only) by implicit or ID/IDREF techniques.

** User interface **
Create a dialog that lets you set: Filter name, XSLT file (or sequences
of transformations) to apply, XSLT parameters to pass to each processing
sheet (with variables defined for document base name, current working
dir, the destination file name chosen by the user in the save dialog,
current date and time, ...). Save this information under the Filter name
and make it available under that name as export filter. Create a folder
for each export filters that may hold additional resources for that
filter, like boilerplate graphics, document templates, additional
commands accessed from the XSLT etc.). Doing this ensures that default
resources like images are in a known place relative to the processing
XSLT. (Or make it a bundle for easily sharing a filter???)

** Images **
Things start to get a little complicated when images (or any other data
types like movies, sound, ...) get involved. We took the way to
externalize these to files and only hold a link to them in the XML, but
another way would be to inline the data as e.g. base64 or some compressed
format.

** Provide access to all installed XSLT processors **
Ship with a default XSLT processor (probably native; xsltproc comes to
mind). But also allow the use of other installed XSLT processors like
(Java-) Xalan, SAXON, ... . This may be useful when the export filter
requires certain extension functions that are only available for that
processor. You may even make it possible to ship the filter with all
required binary extensions, e.g. by putting the required jar file into
the filter bundle or folder, along with utility jars. I know that the
distribution size might get large, but then if you need that export
format and want it to work out-of-the-box with no additional installs
necessary, this may be worth it. Such a generalized export filter
architecture even would make it possible to append/include third-party
tools like FOP or XFC into the exporting process, giving maximum
flexibility to third-party filter developers.

** Outsource exporting, concentrate on OO functionality and UI **
Ask on the list for authors of export formats you wish to support
"natively" with the application. I'm sure there are several who have deep
knowledge of a certain format because they have to work with it daily.
These might be perfect candidates for creating an export filter. Take
them under NDA and provide a OO alpha seed to them for being able to
creating an export filter based on the above "OOML". This might get you a
broader base of export formats to boot with, freeing resources for things
like adding one more feature to OO that wouldn't have made it otherwise.

** Importing is a completely different story **
Well, the title says it. Creating import filters is probably more
difficult (especially for binary and/or non-XML-based formats). If the
target is also OOML, then you may need to be extremely forgiving in
reading such a file and assume default values for all info elements that
could not be generated from the original format's source. I'm leaving out
the many other issues regarding importing (or even roundtripping) here.

Well, these are my ideas, partially backed by experience with our own
tool. Maybe this is useful to someone.

Regards, Christian.