Table of Contents

DocZilla SGML/XML Components and Browser
DocZilla Technology
How to read this book
1. Getting started
Version Changelog
Acquiring DocZilla Browser
The non-commercial version of DocZilla Browser
Installation on Windows
Installation on Linux
Viewing Documents
Migrating from MultiDoc Pro
Portability of documents
Notes for Windows users
Notes for non-Windows users
2. Reference manual
Core components
User interface
URLs and locations
Table of Contents (TOC)
Message logger
Annotations
User Hyperlinks
Link target dialog
DocZilla Search
Image viewer
CSS Checker
Under the hood
DocZilla processing instructions
Styling documents
External unparsed entities
XLink and HyTime links
XML base
Scripting documents
Special features
CATALOG file
ENTITYRC file
SDATA maps
HTTP server configuration
TOC open-to-depth parameter
Advanced image decoder
NDATA handling in DocZilla
CGM hotspots
A. DocZilla TOC DTD
B. User Interface Reference
C. SGML and XML declarations

DocZilla SGML/XML Components and Browser

Reference Manual (for version 2.7)

Citec Information


List of Tables

B.1. Key to menus

DocZilla Technology

Table of Contents

How to read this book

DocZilla is a set of software components for processing and viewing data in the SGML and XML formats. The main goal of the DocZilla project is to design a solid technology platform that provides a flexible SGML and XML support for enterprise-level document management systems. DocZilla Browser is our proof-of-concept product: a sophisticated and robust SGML/XML browser built on top of the Mozilla core. The DocZilla Browser itself is a standalone tool for viewing SGML/XML documents, but it can be easily adapted and modified to provide different views to all enterprise data. DocZilla can also be deployed as a dedicated viewer or a electronic manual application.

For more information about DocZilla products, see our website at http://www.doczilla.com/ .

This book is a reference manual to the DocZilla features and DocZilla-specific configuration of your documentation environment. It assumes that DocZilla is available to you as the DocZilla Browser. With customized DocZilla based applications, this documentation may be applicable in a reduced form or lack descriptions of additional features not in the standard browser.

This book does not extensively cover SGML or XML itself, the various standards which DocZilla conforms to nor Mozilla's features. For further reference on those subjects, see the bibliography.

How to read this book

The keywords, parameter names, file and path names, computer output, menus or similar names are printed in this style. Footnotes are shown like this [1]. The text may have references to "left-clicking" and "right-clicking". In systems where the number of buttons is equal to two, which are configured for left-handed persons or whose mouse buttons are differently arranged (e.g. some laptops), these terms should be understood as clicking on the primary mouse button and the secondary mouse button. Finally, examples or syntax declarations are rendered like this:

Example 1. A sample example

The example contents goes here.

Typically computer output or some other
example text shown in this style.



[1] This is an example footnote. CSS2 does not transparently provide support for traditional footnotes.

Chapter 1. Getting started

Version Changelog

Here's a list of major changes between DocZilla versions 1.0 and 2.7:

LinkManager plugins

DocZilla was refactored to support modular architecture in implementing new hyperlinking languages. Software components providing new hyperlink implementations can be only a few dozen lines of code. DocZilla's HyTime support in 2.7 was rewritten as a LinkManager plugin, as well as was the user hyperlink feature;

DocZilla Search Server

This new separately available DocZilla application is used to serve DocZilla Search index files over a TCP/IP network connection. It means you can utilize DocZilla's structured searches and fast index lookups everywhere, not only on your local harddrive. All DocZilla Browsers have support for network searches: all you need to do is to tell DocZilla the mere URL of the index resource (file or network service) and start searching: the access to the index will be completely transparent for the user;

New Mozilla version

DocZilla 2.7 is now based on the stable Mozilla version 1.7.8, which is the latest at the time of writing, June 2005. It means a huge improvement in speed, stability, standards compliance and robustness;

New SGML parser

DocZilla 2.7 now integrates to the de-facto SGML parser of the industry, OpenSP 1.5.1. This means DocZilla is now a validating SGML/XML browser, with quite complete parser support for all the constructs in the SGML standard;

User hyperlinks

DocZilla 2.7 users can now maintain a database of their own hyperlinks. The user can create unidirectional and bidirectional links from any location in any SGML/XML document to another. This brings personal navigation to a new level in a document set;

CGM image plugin

DocZilla has built-in support for the SDI CGM Reader plugin which comes bundled with DocZilla. Targetting and highlighting hotspots in CGM images is possible both from JavaScript and by using regular URL references. The user hyperlinks also support hotspots in CGM images, allowing creation of links from/to a CGM image;

Extended Table of Contents

TOCs can now have multiple columns, with label, target, options and target window name specified for each tree cell. The width of each TOC column can be specified. Also, each cell in the TOC tree now supports alternate URL targets: if there's an ambiguity in clicking on a TOC item, a menu pops up, asking the user to choose one URL target. Targets can also be designated further to icon targets and text targets: the former are only available when the icon of a TOC tree cell is clicked, the latter are available elsewhere in a TOC tree cell;

Acquiring DocZilla Browser

You may have your DocZilla Browser distributed to you on a CD or alternatively by downloading the full version on the customer downloads page on the DocZilla website. Also, these installation instructions apply to the evaluation version of the DocZilla Browser which is also available on the website.

As DocZilla is based on the Mozilla application development framework, it is highly portable to a number of platforms. Currently, there are DocZilla versions available for Win32 API (Windows 95 and newer) and for Linux (glibc2 or newer and GTK+ 2.0).

The non-commercial version of DocZilla Browser

The non-commercial version is a fully working DocZilla distribution except that it is only licensed for non-commercial, personal, or evaluation use. It comes with no official support but questions can be asked on the DocZilla forumhttp://www.doczilla.com/cgi-bin/threads/wwwthreads.pl.

Note

Note: in version 1.0, the non-commercial version had a number of features disabled. In 2.7, these are now enabled for everybody:

  • Table of Contents trees can be generated dynamically using XSLT, in addition to static TOCS;
  • The professional SGML/XML indexing feature is enabled in DocZilla Search;
  • In addition to user annotations, document authors themselves can now attach relative annotation sources to documents;
  • In addition to regular image formats on the web (JPEG, GIF, PNG), DocZilla's advanced image support is enabled. Currently this means support for TIFF raster images in the default release. For more information about the DocZilla image support, see Advanced image decoder section;
  • SSL (Secure Socket Layer) is enabled, allowing encrypted network connections.

Installation on Windows

The Windows version comes with an InstallShield™ installation program. To install the program, execute DOCZILLA.EXE [2] and follow the instructions on the screen. Besides copying the binaries to your disk, the installation program will create shortcuts and, if allowed by the user, add mime-type mappings for certain file extensions to the registry (*.sgml, *.sgm, *.xml, *.xsl).

To uninstall the browser, use the Add / Remove programs application in the Control Panel.

Installation on Linux

The Linux release of DocZilla is currently distributed in tarball format (.tar.gz) for compatibility. Unpack the tarball file to a suitable directory with the tar command [3] or alternatively with a graphical archiving utility, like Gnome FileRoller or KDE Ark. The destination directory should be in your home directory, or for system-wide installations, under /usr/local/. Start the browser by running the file doczilla in the DocZilla directory.

DocZilla does not need write access to other system directories, like /etc. It stores its user preferences and data in the ~/.doczilla/ directory.

Viewing Documents

DocZilla's main objective is to read and display any valid SGML or XML document without additional pre- or postprocessing stages. The markup parser that DocZilla 2.7 uses is OpenSP 1.5.1 which has fairly complete SGML conformance and is very strict in regard to the syntactic and semantic validity of the documents and DTD subsets. Invalid documents may generate warning and error messages, or even critical errors that keep DocZilla from finishing the document load. Any valid SGML document should raise no errors.

In general, the documentation environment should be set up so that all the required DTD subsets are found and accessible. Typically this includes proper catalog files to map public identifiers to formal system identifiers and storage object identifiers. XML files are an exception in that they do not necessarily require a DTD to be loaded in DocZilla. About writing portable documents, please see Portability of documents chapter.

All documents, however, need styling information in order to be sensibly laid out on the screen or printer, as opposed to the plain text view that shows up when no layout style is given. DocZilla supports Cascading Stylesheets and XSL Transformations. The former is the primary styling language, whereas the latter can be used to reorganise or translate the document into a new structure for viewing. An example would be to convert an XML document to elements in HTML namespace, where styling information is implied for each element.

See the chapter Styling Documents for more information about stylesheets. Here's a small quick-start example of how to attach a CSS stylesheet to a document.

Example 1.1.  Adding a sample stylesheet to SGML/XML document

In the document, use the xml-stylesheet processing instruction to tell DocZilla about the stylesheets. In SGML documents, use the mde-stylesheet[4] PI instead.

<?xml version="1.0"?>
<?xml-stylesheet href="meeting.css" type="text/css" title="normal"?>
<doc>
  <title>Example</title>
</doc>

Migrating from MultiDoc Pro

Users of MultiDoc Pro are used to ViewPort™ stylesheets. DocZilla does not support the ViewPort format by default, so a tool to convert the .SSH stylesheets to CSS was written. The tool is called SSH2CSS.EXE. It is included in the DocZilla distribution for Windows, and it is also available on the DocZilla website as a plain executable file.

The format of ENTITYRC has changed a little since the times of MultiDoc Pro. Mainly, all arguments to keywords are required to be quoted, and secondly, the syntax of some keywords has been extended to support the wider range of functionality in DocZilla. See the section ENTITYRC files for more information.

Portability of documents

The issue that is most likely to restrict the portability of your documents is file naming convention. There are many file systems on many computer platforms and operating systems and they work differently when referring to files. For example, Windows file system is case-insensitive in the sense that document.sgml, Document.sgml and DOCUMENT.SGML refer to the same file. On the other hand, a Unix system traditionally considers these as different files. An old Apple MacIntosh uses case-insensitive addressing, whereas the new Mac OS X can work case-sensitively, in some cases. In general, case-sensitive documents work on case-insensitive systems but not vice versa.

Notes for Windows users

Tip

When typing system IDs in DTD subsets, ENTITYRC, CATALOG etc. files, use the appropriate and correct name case;

Tip

The CATALOG and ENTITYRC files should be named catalog and entityrc, all in lower-case letters. This helps DocZilla find them more easily on a case-sensitive file system;

Tip

Use the forward slash (/) character as the path separator, not a backslash (\): DocZilla works with the notion of URLs and it can translate the relative URL directory/file.sgml to point to a DIRECTORY\FILE.SGML on a Windows system. However, on non-Windows platforms the backslash is not treated specially and it thinks a backslashed URL points to a file literally named as "DIRECTORY\FILE.SGML".

Tip

If possible, try to configure your SGML/XML authoring tools to avoid these pitfalls, too. A non-portable SGML documentation environment is as useful as a proprietary binary-only document format.

Notes for non-Windows users

Tip

Name your files using the traditional MSDOS/Windows convention, using file name suffixes. That is, first append a dot to the name, then the suffix itself, like .sgml or .xml. Windows does not support explicit file meta data or automatic recognition based on the file contents and, thus, relies heavily on the suffix to determine mimetypes of files.



[2] In some distributions, the installation program is named SETUP.EXE

[3] See the manual page of tar for more information: type man tar

[4] A historical anecdote: the acronym "mde" that pops up regularly in the DocZilla terminology stands for "MultiDoc Engine", referring to MultiDoc Pro, Citec's previous SGML browser.

Chapter 2. Reference manual

This chapter explains the functionality and features of DocZilla in detail. The first section covers the features that are visible to the user of DocZilla Browser or other applications based on DocZilla. The appropriate user interface is explained, as well as the instructions to implement and utilise the feature in your documents, if any. The second section lists features that are related to document authoring or building a DocZilla documentation environment and last section contains information on miscellaneous features and DocZilla specific issues.

Core components

User interface

The user interface of DocZilla Browser is intended to feel familiar and be as simple as possible and easy to use. This section will describe the DocZilla specific user interface items and dialog windows. For a general reference to the user interface and menus, see the User Interface Reference appendix.

Preferences

The preferences window mostly the same as in the Mozilla Browser. There's a new preferences section for DocZilla, on the very bottom of the panel list.

The "Table of Contents" panel contains preferences items related to the Table of Contents component. It provides controls to set the default value of the open-to-depth expansion level. It also allows to change the priority of the default value over the value in the TOC file or TOC invocation.

The second panel, "Extended XLinks" controls the actuation rules and the user interface of the XLinks component. You may give DocZilla much or little control over how links are automatically followed and when a target selection dialog should be popped up for you.

The last panel, "NDATA" provides default handlers for displaying NDATA (non-SGML data) entities, based on the name of their notation. These values are used if they are not overridden in the DTD. NDATA types identified by their names can be set to default to being displayed as an image, handled by a plugin, in a view embed to the document, not displayed but automatically linked to, or plain simply ignored. Also, you can choose to ignore any referred entities of unknown NDATA type: otherwise DocZilla auto-generates a hyperlink explaining the entity and pointing to the target system ID.

URLs and locations

DocZilla addresses files and resources using standard Universal Resource Locators, either relative or absolute. The URL schemes supported by DocZilla are equal to the ones supported by Mozilla, including http, https, ftp, file and javascript. This enables DocZilla to operate transparently over networks just as easily as over local files.

Table of Contents (TOC)

The Table of Contents (or DocZilla TOC) component maintains trees of TOC items, and it is capable of displaying them visually in an interactive pane in the graphical user interface. TOCs can be used to navigate within a document or a set of documents. TOCs can be static or dynamic, they might be document specific or permanent, they may be nested and they understand various parameters that control how they are processed.

The TOC pane

You can click on the tree items to activate the targets of the tocitems. Some items contain sub-items in which case you can open the lower tree branch by single-clicking on the "+" sign (also known as “twisty”) or by double-clickin the whole item.

The TOC items have two additional uses: sometimes they contain no targets and nothing happens when you single-click them: this is common when the tocitem is just a placeholder for sub-items or when it is a root of a TOC. The TOC root items are displayed on a darker background, and they stand out a bit when compared to regular tocitems.

The other use is that a TOC item contains more than one target in which case a popup menu is displayed, suggesting the user to pick one of the targets.

Also, there's an advanced mode of display: a TOC may have more than one columns in which case one (usually the first) column contains the branch lines and twisties of the tree structure. There can be zero of more targets not only for each row but each row and column. The tree cells can be activated by clicking on the row on different columns.

There's a yet another feature that allows adding icons to the TOC items as well, and making the icons separate, additional targets. It means that besides each TOC item can have multiple columns (and zero or more targets on each), each column of an item can also have an icon, containing zero or more additional targets that can be activated by clicking on the icon image. However, as good as it sounds, this feature hasn't got a general user interface yet. While it is otherwise quite simple to use from the TOC XML document, it also involves writing of a custom “chrome JAR” package that provides a “XUL overlay” to define the actual icon. Commercial users, please ask our product support for more detailed instructions, if you wish to use this feature yourself.

A context menu is available in the TOC pane, reachable by right-clicking a tocitem. The context menu contains different items depending on whether you popped it up over a regular tocitem or one that represents a root of a TOC. The tocitem menu contains an option to open the target of the tocitem in a new window, as opposed to by replacing the currently shown document with the target. The TOC menu contains two items to help removing complete TOCs from the in-memory tree.

Static TOCs

Static TOCs are XML files following the DocZilla TOC DTD. They are loaded by the DocZilla TOC subsystem when the document is loaded, and added to the internal TOC item tree. See the appendix on TOC DTD for detailed information about the TOC structure and defining link targets with element attributes. The TOC DTD is simply a toc element that contains any number of tocitem elements. Each of them represents a tree item on the interactive pane. Each tocitem begins with one or more link element, followed by zero or more tocitem elements. Each consecutive link element will denote the target for one TOC item column. The link element must contain a title element as their first child element. The text content inside the title is displayed on the visual TOC pane as the content of the tocitem.

The link elements contain the title of the tocitem and optionally a reference to the target URL of the tocitem. It may be a URL fragment (e.g. #mychapter or #element(/1/1/2)) in which case it is resolved against the displayed document. If it is a relative URL, it is resolved against the URL of the TOC file itself, or against the XML base URL if available. The URLs are loaded to the primary browser view in the same window where the TOC pane is located. The actuation is processed as if the URL was typed in the browser location bar or loaded as a result of clicking an ordinary link. There are no limitations whether the tocitem points to an image file, a PDF document or even to a snippet of JavaScript, using the javascript: URL handler. Therefore, TOCs can act as the table of contents of both a single document and a set of documents and resource files.

Attaching TOCs to a document

A document can be told of a TOC in two ways: by using a special processing instruction or via an entityrc file. The first option is discussed here. The ENTITYRC method is feature-wise analogous to the PI method except for the syntax which is explained in the ENTITYRC section.

The syntax of the TOC PI is:

Example 2.1.  Syntax of the TOC processing instruction

<?mde-toc
  href="URL"
  type="[text/xml|text/xsl]"
  title="string"
  persist="[sticky|permanent]"
  open-to-depth="a number"
  name="string"
?>

The pseudo-attributes are defined as:

href

An URL to the TOC XML file

type

Type of the TOC, either a static (XML: text/xml) or dynamic (XSLT: text/xsl). Note that the type here is only a mere indicator of the type of the upcoming TOC and not related to the official content type of the TOC file. See the section about HTTP content types for more information.

title

The title of the TOC that is shown in the root tocitem in the visual display pane.

persist

Defines the persistence of the TOC. First and by default, a TOC is related to a document and when another document is loaded over that document, the TOC disappears from the internal TOC tree and the visual TOC pane. However, if this pseudo-attribute is set to "sticky", the TOC remains in memory until another sticky TOC is possibly loaded later. The URL targets, both relative and absolute, contained by the sticky TOC are loaded and displayed in normal fashion. Sticky TOCs may be handy if you have split your document files into a number of groups and you want your TOCs to be displayed per-group instead of per-document. The third option is to set the TOC to persist as "permanent" which means that the TOC will be present until the browser is quit or when the user manually removes the TOC.

open-to-depth

The value of this pseudo-attribute is a number greater than or equal to zero. It determines the number of nested TOC levels to be initially opened, starting from the root of the TOC tree. A value of zero opens no levels, displaying only the TOCs. A value of 2 opens the TOCs by default and also the two highest levels of tocitems, leaving the levels lower than 2 closed. See the TOC open-to-depth parameter section for details on how its final value is derived.

name

This pseudo-attribute is used to attach a name for the TOC being loaded. The name will be used in creating sub TOCs, as described in the following section.

Generating TOCs on the fly at server side

The TOC file is just plain XML. It means that if your documentation environment resides on a web server, the TOC file can also be generated on the fly as the browser makes the request to load the TOC file. Popular server-side scripting techniques such as Perl, PHP and Java Server Pages all have code libraries for handling and outputting XML, so it can be a feasible choice if such behaviour is needed. This could be desirable if the TOC is dynamically generated but compiled of a larger set of data that is sent to the browser.

Using sub TOCs

A so-called "subtoc" is a whole TOC inside another TOC. If a new TOC is given a name when it is loaded, either from the TOC processing instruction or the ENTITYRC file, it becomes a potential subtoc. If any of the existing tocitems in memory contain the subtoc attribute whose value matches the name of the new TOC, the tocitem will act as a placeholder for the new TOC. If the value of the subtocattribute is _all, then any named TOC will located inside the placeholder element. An example of subtocs being used is the front page of the DocZilla Demokit: the first view contains a number of tocitems pointing to different demonstration documents, but when some of these documents are loaded, a new, document-specific TOC is added under the originating tocitem. Subtocs are no different from normal, top-level TOCs except for their position in the in-memory TOC tree. They are also shaded a bit differently in the visual TOC pane.

Dynamic TOCs

The term "dynamic TOC" has two different meanings. Their common denominator is that a part of or the whole the TOC is not hard-coded in the TOC file but rather generated later on based on the document contents.

Using title references

This first form of dynamic contents uses the titleref element rather than the title element. The titlerefs should contain URL fragments[5] like XPointers to refer to the elements containing the title texts, and they won't be resolved until the document is loaded. This can be useful if you have a huge number of documents of exactly the same form (e.g. a series of spare part descriptions) and you don't want to spend computing power in doing transformations or server-side TOC generation.

Using built-in XSLT TOC feature

This is the premier TOC generation method. When invoking the TOC from the processing instruction or via the ENTITYRC, the type of the TOC must be "text/xsl". The TOC file must be a proper XSL stylesheet, which is then fed to an XSLT transformer as the stylesheet of the actual document. The result of the transformation is used as the final TOC file. This way, it's easy to gather all significant chapter and section titles into one TOC and have that generated on the fly for each instance of the same DTD.

Message logger

The message logger facility collects informational, warning, error and critical error messages from all other DocZilla components. The message logger window displays a list of messages that have been raised since the last start of a document load.

The message logger window has a View > Auto-show message logger menu item that controls the message severity level at which the message logger should pop up. By default, it pops up automatically if any errors or critical errors are raised. The first time that happens, a small dialog window is popped up before the message logger, asking the user for his preferred severity level. The user can then exit the dialog either directly to the message logger or skip popping it up that time.

Annotations

DocZilla supports annotating documents. In DocZilla terminology, annotations refer to attaching content to documents externally, that is, without modifying the document. The precision, or granularity, of a single annotation is currently one element. The users can maintain their own base of annotations and document authors can publish additional sets of annotations with documents. The collection of annotations in an annotation file is called “annotation context”. The standard annotations contain a textual message and some metadata, and they can be edited with a built-in annotation manager.

How annotations work?

Technically, the annotation facility is layered into modules, including interchangeable backends for loading annotations. The default backend loads and writes XML-based RDF files, but it'd be easy to extend the annotation facility by writing backends to operate with MultiDoc Pro annotations or with W3 Annotea project compliant annotations.

Also, as the annotation editor is also a mere client to the annotation subsystem it could be programmed into a write-only feedback machine or a sophisticated bug report tool that sends new annotations to a specified central server or emails them to the documentation coordinator. The new annotations could be then ratified and published in a central location, available to all document viewers.

Also for now, only local annotation files can be written to, network annotation contexts are readonly. If another backend module was written that supported updating annotations over the network, e.g. using WebDAV or some other protocol, the change would be transparent to the current annotation system: networked annotation contexts using the new backend would simply become read-write instead of read-only.

User annotations

The user annotation context point to elements of documents addressed by absolute URLs. They are saved in the user's profile directory, in a file called dzuserannotations.rdf. (It shouldn't be necessary to ever modify or move that file by hand.) The user annotations are loaded and become available automatically when DocZilla is started.

Author's annotations

Authors annotation contexts are relative, as opposed to the user annotations. They're loaded via the ENTITYRC file, and they contain relative URLs to their targets. The URLs are resolved against the ENTITYRC URL; in other words, it matters where they're loaded, not where the author annotation file is located nor what the document URL is. This is good e.g. if you want to maintain a global annotation file on a network drive or at an HTTP URL: you can point to an absolute URL to fetch the annotation file but the annotations still point to their relative targets.

Using the annotation editor

The annotation editor can be launched in two ways: either via the menu item View > Annotations… or by making a selection in the document text, right-clicking to pop up the context menu and choosing Annotate….

The editor window is divided into two parts. On the left there is a tree of annotation sources, including the user annotation file, with single annotations as child items. The right side is reserved for editing whatever is selected in the left part. The right part is in turn divided into upper and lower sections. The upper frame contains information related to an annotation context. The lower frame contains information related to a single annotation.

Basic functionality

The tree on the left of the window contains annotation contexts and annotations, which can be viewed by clicking on them. If the clicked item was an annotation, also the context it belongs to is viewed. The menu item File > New > Annotation context can be used to create new annotation contexts and File > New > Annotation creates a new annotation to the currently selected context. The new annotation won't be added to the list until the fields are filled in and the changes are saved.

The editor provides standard cut, copy and paste functionality for easy copying, moving and deleting of annotation. These functions work on annotations only, annotation contexts can only be created and deleted. If an annotation is selected, cut and copy work on the selected annotation. If an annotation context is selected, they work on all annotations in the selected context. The paste function always works in the current context, provided that it's writable. As the user context works with absolute addressing and other (author) contexts are relative, DocZilla tries to do the best of magic to preserve the URLs of annotation targets consistent when copying to and from the user context.

Opening an annotation context is functionally the same as loading one via ENTITYRC: the appropriate annotation context is loaded into memory and affects all subsequent document loads. The File > Save context is essentially the same as clicking on Save changes in the annotation editing frame. The File > Save context as menu item will save the current annotation context into a new location. Note that the original context seems to disappear from the list but it just changes locations, as the "save as" implies, hence it looks different in the list. The "old" context will be loaded from the original URL and added to the list next time it's referred in an ENTITYRC file.

Annotation contexts

The first item in this frame is the URL where the annotation context was loaded. The URL is not part of user contexts. The next item is available only for author contexts and it contains the URL that the annotations are relative to. Normally, it's more useful to just see the URL than to modify it, but changing it is not disabled since it may be useful during reorganizing or merging annotations contexts.

Next, the number of annotations in the selected context is shown. Aside of it, there's a checkbox that can be used to (temporarily) disable and enable the whole context. Finally, there's a button that can be used to remove the annotation context from memory. It doesn't erase the corresponding file nor delink the annotations from the document. If you need to remove an annotation context, simply remove a reference to it in the ENTITYRC file.

Annotations

When an annotation is clicked on the left-side tree, its contents is displayed in the window. The metadata for an annotation includes the author name, the target URL with an XPointer fragment to the target element, a boolean indicating whether this annotation should be popped up automatically when the document loads or only at user request. The message itself can be edited in the large textbox. The Save changes button will update the data and tries to save the annotation context to disk, if possible.[6] The little hyperlink, Go to target, will open a small window and load the target of the current annotation to the view. This is useful for locating the annotation or simply checking out what it looks like in on a real page.

User Hyperlinks

User hyperlinks is a new feature that consists of three technical parts: a database of user-owner (third-party) links, the LinkManager plugin to inject links to documents off that database and a user interface to create and delete link arcs. The two first parts are only mentioned for technical reference, and the last part is discussed here.

Creating links

Links are built by making a textual selection in the document and adding the selected location as a source end or a target end to the link being created. A floating window represents the link being built, maintaining one global state of that link accessible all the time from any window. The first time a location is added, the window opens up automatically. All selected locations are listed in that window until either they are approved as a new hyperlink or the dialog is simply dismissed.

Select Edit > Set link start… (or, the same in the context menu) to mark the start of the link. Only one start point can be specified, with later target selections replacing the previous one.

Select Edit > Add link end… (or, the same in the context menu) to append a new end of the link. At least one link end must be added in order to finalize creating the hyperlink. In the floating window, link ends can be removed or made unidirectional/bidirectional while still editing the link.

Deleting links

The window for deleting links can be invoked from Edit > Delete links…. It can list all the link arcs currently in the database. Note that for one bidirectional link, two arcs are listed and you can delete either one of them, or both.

On top of the window there's a text field for entering a URL prefix to restrict the number of links listed. A link will be listed if one of its start or end points matches the URL prefix. For example, the prefix file:// would list all links that point to, or from, a local SGML/XML file. The prefix is the URL of the current document by sensible default, which effectively selects all the links that originate from or target the current document.

Link target dialog

The link target dialog appears when there are more than one possible XLink targets to choose from. Extended XLinks will remain in memory as they may predefine links from some arbitrary documents to others. The dialog allows double-clicking on a single target, carefully selecting a number of targets to be opened at once or simply cancelling all link actuation requests.

DocZilla Search

DocZilla search is available in the Tools > DocZilla Search menu. In the search window, on top of the window there is the search query text field and below that, there are three alternate search methods: file, directory and index search. Most of the controls are common between the different search methods. The file and directory search both have a text field where you can add paths to files and directories, separated with spaces. (A path containing a space character may be enclosed within double-quotes.) The index search requires the path of one index file in the text field.

The file search and directory search both have checkboxes to change the case-sensitivity of the search and whether the search words should be recognized as whole words in the document or as substrings. The directory search has an additional control of whether to recurse into subdirectories. If you enable recursive search and start from the root directory of your document tree (or the root directory of your hard drive) the search might take a while as it goes through every directory under the root directory. You may want to use the indexing capability to reduce search times on large sets of documents. The directory search will scan all SGML and XML files found in the given directories. So far, the files are recognized by the suffix after the last dot character in the file name. In other words, files that match *.sgml, *.sgm and *.xml.

Query language

The query language is simple. It contains search words, separated by whitespace. Within all the files that were searched, any element whose name matched one of the search words will be returned. Optionally, a search word may be prefixed with the "+" character or "-" character to make the presence or absence of an element a requirement, respectively. When the index search is not used, the search words may end in the "*" character to match only a start of the element name, or they may contain the "?" character to match element names that contain any printable character. Words spanning over several elements with no white space breaks in it are treated as a single word.

Example 2.2.  The sample query "tie??k*" will match

<para>
The Finnish word for "computer" has the syllables:
<syl>tie</syl><syl>to</syl><syl>ko</syl><syl>ne</syl>
</para>

Indexing files

The third search method, index search requires an index in which to look up the search word. On the index search tab, there's a button for an alternative view which enables the user to create indices of selected files. Click on the Create new index… button to enter this view. The list of available options from left-to-right, top-to-bottom order are:

  1. The first text field must contain the path to the file in which the resulting index will be saved. For convenience, you can use the Find file... button to help picking the correct path.

  2. The next field can contain a path to a file containing a list of words, one per line, that should be rejected from or exclusively included in the index. A list of the most common English words is included in the DocZilla distribution (see this resource file for more information), and the field can be set to point to that file by clicking on the Use default button.

  3. DocZilla considers any length of continuous word characters as one word which is then written to the index. The set of word characters defaults to the base and ideographic characters defined in XML specification. This field allows defining additional such characters: typing characters “0123456789” or the range0-9” will make DocZilla think of e.g. CIRCLIP42 as one word. Otherwise only CIRCLIP would have been indexed from that text.

  4. The next item is a checkbox that determines whether the index will be case-sensitive or not. In a case-sensitive index, only case-sensitive searches can be done and vice versa. Case-insensitive indices are also a bit smaller in size since it the number of different words is less than if words in different were treated unique.

  5. A reverse index is a rather special mode in which for each word and attribute value two index entries will be written. First, the word or value itself as usual and then the same but lexically reversed. This allows for building user interfaces for searching that can do postfix search in addition to prefix search, that is e.g. “foo*” and “*foo”. Note that with the standard DocZilla Search user interface, queries for reversed words are not automatically available. This option is generally needed only when creating indices for a custom search interface that handles reverse queries properly.

  6. The next choice is about whether the index should point to the files in an absolute manner or relatively. Absolute indices are tied to the user's configuration of hard disk partitioning and directory structure on the disk. If the files are moved, the search hits can't be reached anymore. This is generally only suitable for personal indices on data that will remain in the same place and not moved to other computers. The relative indexing model refers to the files in relative manner, starting from the directory that will be indexed. The resulting index file can be moved around along with the indexed documents, even to other computers or to a compact disk.

  7. The last configuration item defines the directories to make an index of. It's slightly different for absolute and relative indices in that you can index several directories into an absolute index and only one into a relative index[7].

    In the relative mode, you can explicitly list the files to be indexed. DocZilla will ask you for a text file that contains the desired paths to the files, relative to the directory to be indexed. The text file should contain valid pathnames, one per line. After loading the text file, DocZilla will inform you how many pathnames it managed to read from the file.

    Finally, click Create index. If the number of data to index is large then a progress bar will show up, listing the number of files and kilobytes processed so far.

Image viewer

DocZilla provides a separate view for bitmap images. Using the right mouse button, pop up the context menu over an image in the document, then select Zoom image…. A new window will appear with the image displayed in a viewport. The user interface is simple: the toolbar contains magnification controls and you can move around the image by using the scrollbars. The zooming window is useful with large images that wouldn't otherwise fit in the browser window.

CSS Checker

This little tool is designed around helping to style SGML and XML documents. As we know, there are no implicit style rules in effect for any particular SGML element. It means that basic style rules must be explicitly applied to every element. Combined with the gradually expanding document as it's being authored, this often results in inconsistent or undefined style rules in the CSS stylesheet.

One particularly common pitfall is the display property: it defaults to inline for every element, but inline elements can only contain other inline elements or elements with no display. This means that practically any container element must be explicitly set to display as a block element (or e.g. table). Putting a block element inside an inline element doesn't make sense and thus results in undefined behaviour in the Mozilla's Gecko rendering engine. With large DTDs that are extensively used by a document, this becomes notoriously painful to debug and often hinders resolving other, more obvious CSS problems.

Currently the CSS Checker makes sure that the display property of each element is legal in the context of the parent element. With a correctly displayed element tree, it'll be much easier to spot other rendering anomalities and to tune the CSS rules accordingly.

You can find CSS Checker from the Tools menu.

Under the hood

DocZilla processing instructions

DocZilla recognized a number of processing instructions that control various aspects of DocZilla's functionality.

xml-stylesheet, mde-stylesheet

The first PI is for XML documents as specified in the standard, the latter is a DocZilla specific PI that is functionally equal to the XML PI but is available for SGML documents too. See the section Styling documents for detailed information and examples.

mde-javascript

This is used to load JavaScript files to be evaluated after the document load has completed. See section about Attaching scripts to documents for detailed information and examples.

mde-toc

Tells DocZilla to include the TOC specified in the processing instruction to the containing document. See Attaching TOCs to a document for detailed information and examples.

mde-doctitle

This processing instruction will allow setting the title for the document. Without the title, DocZilla will not show anything particular in the window title bar. The meaning of document title is effectively the same as the TITLE element in HTML DTD.

Example 2.3.  Syntax of mde-doctitle processing instruction

<?mde-doctitle cdata="string"?>

The value of the minimized pseudo-attribute cdata, enclosed in double quotes, will be shown as the document title. If the string begins with the "+" (plus) character, it will be appended — the "+" character excluded — to whatever the current document title was before this processing instruction. Earlier definitions could occur in other mde-doctitle processing instructions (possibly set in another file if the document is compiled from several external parsed entities) or from the ENTITYRC file.

For example, the DOCTITLE initially set in the ENTITYRC could contain the manufacturer name, each documentation module file would add the name of the module as its own sub-title and each reusable component document (referred from the documentation module file as an external parsed entity) would finally add the name of the particular component to the title string that would be shown in the window.

Styling documents

DocZilla supports CSS up to version 2 and XSLT styling rules. The stylesheets are attached to documents either by using one of the appropriate processing instructions or via the ENTITYRC. The ENTITYRC method is feature-wise analogous to using the PI except for its syntax, which is explained in the ENTITYRC chapter.

As explained in the document Associating Style Sheets with XML documents, we can use the following processing instruction to attach stylesheets to XML documents:

Example 2.4.  Syntax of the XML stylesheet processing instruction

<?xml-stylesheet href="URL" type="text/css" title="string" alternate="[yes]"?>

Since this is defined for XML only, a variant of the PI is used in SGML files:

Example 2.5.  Syntax of the SGML (MDE) stylesheet processing instruction

<?mde-stylesheet href="URL" type="text/css" title="string" alternate="[yes]"?>

The pseudo-attributes are as follows:

href

A URL to the stylesheet file.

type

Type of the stylesheet file, in this case text/css. Note that the type here is only a mere indicator of the type of the upcoming stylesheet and not related to the official content type of the stylesheet file. See the section about HTTP content types for more information. XSL-stylesheets require the value of text/xsl for this pseudo-attribute.

title

The value of this pseudo-attribute is used as the title of the stylesheet, should it be displayed in the View > Use style menu entry in the browser. Default stylesheets shouldn't have a title set.

alternate

If the value of this pseudo-attribute is set to yes, the stylesheet will be regarded as an alternate style and displayed in the View > Use style menu entry in the browser. The user can dynamically choose different styles to be viewed. See the HTML stylesheet reference for a more detailed explanation about preferred and alternate stylesheets.

Cascading stylesheets (CSS)

DocZilla supports CSS version 2 almost completely. For general information about Cascading Stylesheets, see the appropriate W3 Consortium recommendation, as well as an article about using CSS with XML. DocZilla applies CSS to SGML documents as well, which is not different than writing CSS for HTML documents as HTML is already SGML and CSS was first designed to work with HTML anyway.

Case-sensitivity

DocZilla treats SGML documents as case-insensitive and XML documents as case-sensitive. This must be taken into account when writing the CSS rules, too.

CSS1 and CSS2 compliance

DocZilla supports CSS version 1, and much of CSS version 2. Because DocZilla uses the Mozilla Gecko rendering engine, the compliance to those standards is effectively the same as in Mozilla 1.0 browser.

XSL transformations

DocZilla supports a one-phase transformation of documents after they are loaded but before they're displayed. An XSL stylesheet must be invoked from a processing instruction or via ENTITYRC, and only the first one is processed. CSS stylesheets are applied as usual, but on the document that resulted from the transformation. The title and alternate pseudo-attributes are ignored in the case of an XSL stylesheet.

Note

Using XSLT only affects the document in memory, the document object model (DOM). This means that e.g. search hits from DocZilla Search point to the original, non-transformed document. If the transformation severely changes the document structure, the search hits pointing to a certain location in the document may target wrong or non-existing elements.

External unparsed entities

DocZilla will try to display external unparsed entities in the most sophisticated way as possible. Images and plugins will be displayed embedded on the page, HTML documents in an embedded frame and a hyperlink will be generated to all those entities whose type was unknown or which DocZilla wasn't able to display. Document style rules apply as usual to the elements containing those entities for greater control over how the non-SGML data is rendered.

For more information about displaying external unparsed entities, see the section about NDATA type handling in DocZilla.

Supported formats

Regular image formats are supported off-the-shelf, including GIF, PNG, TIFF and JPEG. More professional image formats (such as CGM) can be licensed separately. Also note that most of the Netscape 4, Netscape 6 and Mozilla plugins such as Flash work in DocZilla.

Images as file references

Some DTDs (e.g. DocBook) use so-called file references instead of entity references to display graphics. It means that the container element has an attribute of type CDATA whose value is the system identifier of the image. This is also supported in DocZilla, see the IMAGEREF keyword in ENTITYRC section for more information.

XLink and HyTime links

DocZilla supports the XLink standard, including extended XLinks in addition to simple XLinks. The only feature that is not yet supported is showing the link targets embed in the document. For more information about XLinks, see the XLink standards page. DocZilla supports multiple linkends and bidirectional links where possible.

Extended XLink control dialog

To control the memory-resident extended XLink arcs, choose View > XLinks…. The control window will list each XLink rule currently in memory and allows disabling and enabling them. If a large number of XLinks are in memory that possibly overlap or simply have the same start points, browsing may become confusing as they tend to request actuation all the time.

HyTime support

A widely used subset of the HyTime standard is supported: contextual links (CLINK) with nameloc, treeloc and queryloc addressing included. Also, HyNames attribute is supported to activate hyperlinks in DocZilla for any DTD.

XML base

DocZilla supports the XML Base feature. Any XML element can contain the special XML namespace attribute xml:base. The value of that attribute will be used as the base URL of any relative links located in the DOM tree below the element contains the XML base URL. See the XML Base page on the W3C site for more information.

Scripting documents

DocZilla provides support for scripting languages to run external program code in restricted environment with real-time access to the document's contents and the DTD. The scripting framework is based on the Mozilla's XPConnect facility that enables scripting of theoretically all of the native code. So far, JavaScript support is complete and included in DocZilla.[8]

Attaching scripts to documents

There are two ways to attach external scripts to documents. One is a processing instruction and the other is via ENTITYRC. The latter is feature-wise analogous to the first method and it's syntax is described in the ENTITYRC chapter. Also, it's possible to include snippets of JavaScript inside the document.

The syntax of the processing instruction is:

Example 2.6.  JavaScript processing instruction

<?mde-javascript href="URL"?>

The value of the pseudo-attribute href is a URL to the text file containing the JavaScript to be evaluated.

If you include the HTML namespace in your XML document, you can use the HTML SCRIPT element to contain your code. Also, an easy way to trigger evaluation of short JavaScript code when the user presses a button or clicks on a link is to make the element a link pointing to a javascript pseudo-URL. The URL looks like javascript:code here; and the JavaScript code right from the first colon will be executed the link is triggered.

The environment and interfaces

As for the scripting environment, DocZilla should be familiar to all those who know JavaScript and DOM as implemented in regular browsers like Mozilla, Opera and mostly in Internet Explorer too. There are plenty of references and tutorials on the subject.

Special features

CATALOG file

The catalog parser in DocZilla implements the most common entries in the catalog file format, as described in Oasis-Open entity management page. The supported entries are as listed on the SP website. Any entry that affects the direct loading of the SGML entities and other catalog files are in effect. Also entities and notations are resolved with the help of any additional information in the catalogs.

The most important rules only are listed here:

PUBLIC "publicid" "systemid"

Maps a given public identifier to a system identifier.

ENTITY "name" "publicid" ["systemid"]

Maps the entity name to public identifier and possibly also to a system identifier.

CATALOG "href"

Loads a subsequent catalog from the URL “href” after this catalog has been processed.

DOCTYPE "name" "systemid"

Maps the public identifier and possibly a system identifier for the document type name.

OVERRIDE "yes|no"

While processing the catalog file, DocZilla maintains an OVERRIDE mode setting. The override mode is initially false, and may be changed with this keyword in the catalog file. When the override mode is true (yes), ENTITY, DOCTYPE and NOTATION entries will be resolved by looking up their public ID whether or not an explicit system ID is present. If the override mode is false, a system ID will always be used if given.

SGMLDECL "sysid"

For SGML documents, uses the SGML declaration defined in sysid. See SGML and XML declarations for more information.

ENTITYRC file

The ENTITYRC file contains DocZilla specific instructions and resources to control how the document is interpreted and viewed. It is thus analogous to the semantics of a processing instruction in the document markup. The format of the DocZilla ENTITYRC is loosely based on the Synex Viewport™ ENTITYRC format used by MultiDoc Pro — converting MDP entityrcs for DocZilla shouldn't imply more than a few tweaks.

The ENTITYRC file consists of scope keywords and instruction keywords. The scope keywords declare the context to which the following instruction keywords are attached. Currently there are two scope keywords, PUBLIC and DOCTYPE. The following keywords until the next occurrence of either of those two will affect documents that have that particular public identifier or doctype as defined in the arguments to the scope. For example:

Example 2.7.  A sample ENTITYRC file

PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
    DOCTITLE "DocBook example"
    JAVASCRIPT "hello.js"

will set the document title that is shown in the window title bar to “DocBook example” and load and evaluate the JavaScript code in the file hello.js only for all documents whose public ID is the one of DocBook-XML version 4.2.

Keywords

Each of the keywords accepts one or more mandatory arguments and sometimes, one or more optional arguments. Each keyword must be unquoted whereas each argument must be enclosed in single or double quotes.

ENTITYRC "href"

Load a subsequent ENTITYRC file located at “href” after this file has been processed.

PUBLIC "publicid"

Sets the context to the specified public identifier. The following instruction keywords will be taken into account for documents who match the given public identifier.

DOCTYPE "doctype"

Sets the context to the specified doctype. The following instruction keywords will be taken into account for documents whose doctype matches the given “doctype” argument.

SDATAMAP "href"

For documents in the current context, load the SDATA definition file from the URL “href”. The file will be merged to the in-memory map of SDATA entities and won't be loaded twice until DocZilla is restarted. Note that while SDATA maps can be loaded along with any document, SDATA entities are only available in SGML documents. The SDATA entity definitions in SDATA map file must follow the syntax defined in section SDATA maps.

JAVASCRIPT "href"

Load JavaScript code from URL “href” and evaluate it when the document has finished loading.

DOCTITLE "string"

Use “string” as the document title which will be shown in the window title bar. If “string” begins with the '+' character (plus sign), it will be concatenated to whatever the document title was set before. Doctitle definitions in ENTITYRC are processed before mde-doctitle processing instructions in the markup document.

ANNOTATION "href" "type"

Start loading a new annotation file from the URL “href”. The “type” is the type of the file at the URL “href”: for now, it should be rdf or “localdir”. In the former, the “href” points to an RDF file created by DocZilla, containing the annotations and in the latter case, “href” points to a local URL that is a directory: this directory will be searched for any files whose content type is text/rdf which are loaded and finally merged into one annotation context. This is good if you need to easily include a varying set of RDF annotation files to a document. The context represents the directory and is read-only, because the annotations get merged together. Each of the files can be edited separately if opened in the annotation editor, of course.

The annotations — assuming the file is in a format that is recognized by DocZilla — will be added to DocZilla's annotation facility and if any annotations point to the upcoming markup document, they will be displayed in it. See Annotations for more information.

IMAGEREF "spec1 [spec2 [… specN]]]"

This sets up image reference specifications for the document. Image reference specifications are element-attribute tuples which are treated specially in DocZilla. The syntax of a single spec is:

Example 2.8.  Imageref tuple specification syntax

element-name[attribute-name]

Any number of specs may be defined in a single IMAGEREF entry, separated with space characters. Given a IMAGEREF spec, if the element in questions contains the appropriate attribute, the value of that attribute is treated as if it was the system identifier of an external unparsed entity, an image, attached to the element.

Example 2.9.  These two markups produce the same display

<!-- In DTD subset: -->
<!NOTATION jpeg SYSTEM "doczilla:image">
<!ENTITY foobar SYSTEM "image.jpg" NDATA jpeg>
<!ATTLIST graphic image ENTITY #IMPLIED>

<graphic image=foobar>

versus

<!-- In ENTITYRC: -->
IMAGEREF "graphic[image]"

<graphic image="image.jpg">

Note that using image references instead of entities is conceptually wrong: it hardcodes direct references to data files into the document itself where the SGML-way — that is, using entities — enables a multi-layer, platform independent resolution of the final location of the data file in question.

STYLESPEC "name" "href" "type" ["alternate"]

This keyword allows attaching stylesheets to documents. The parameters are semantically similar to the xml-stylesheet or mde-stylesheet processing instructions except that since the leftmost arguments are always required, an empty title must be denoted as "" (a double-quoted zero-length string). Also, the alternativeness of a stylesheet is implied by using the optional argument “alternate” whose value is always the string literal “alternate”.

Example 2.10.  Stylesheets in ENTITYRC

DOCTYPE doc
STYLESPEC "" "basic.css" "text/css"
STYLESPEC "Fancy" "another.css" "text/css" "alternate"

TOC "title" "href" ["type" ["persist" ["name" ["open-to-depth"]]]]

This entry will attach the given Table of Contents file (TOC) to the document. The arguments are semantically similar to the TOC processing instruction. The value of the “persist” argument should be either the string literal “sticky” or “permanent”. The “name” argument is the name for sub TOC construction and the value of the “open-to-depth” argument should be a number greater than or equal to zero. If any of the arguments should be left unset, just use an empty string ("") instead.

Example 2.11.  A sample ENTITYRC for loading TOCs

PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
    TOC "List of Figures" "doc2figures.xml" "text/xsl" "" "" "1"

In this example, all DocBook XML documents will get a TOC named “List of Figures”. The TOC uses XSLT to presumably compile a list of tocitems that point to all the images found in the document.

SDATA maps

SDATA maps define SGML SDATA entities. An initial sdata.map file is automatically loaded when DocZilla needs to resolve a SDATA entity for the first time, and the initial map can be augmented by loading further SDATA entity definitions from a custom map file via ENTITYRC.

The format of an SDATA map file as supported by DocZilla is described as follows:

Example 2.12.  SDATA map syntax

SDATA "[name]" "value"

At any point on a line, a "#" (hash sign) begins a comment which ends at the end of the line. The “name” is the name of the entity, enclosed first in square bracket and then in double quotes. The “value” is either a decimal or a hexadecimal number, denoted in C-language fashion where hexadecimal values begin with "0x". Here are excerpts from DocZilla's default SDATA map file, that contains the ISO character entity character codes for SGML documents:

SDATA "[aacute]" "225" #     a acute
SDATA "[Aacute]" "193" #     A acute
SDATA "[Alpha ]" "0x0391" #
SDATA "[Beta  ]" "0x0392" #

Note that e.g. the FDATA definitions are ignored. DocZilla is based on Unicode and the SDATA entities are mapped directly to Unicode letters, based on their values. It's up to the operating system then to load and enable the right font to make the available the glyphs requested in the character codes.

HTTP server configuration

For HTTP connections, DocZilla relies on the server to announce a proper content type for each file that it downloads. If the server is misconfigured, it may return false defaults for known file type, e.g. the type text/plain or application/octet-streamfor CSS files, which should be of the type text/css. The result is that DocZilla will see that there's no text/css available when it tries to load a CSS stylesheet and then ignores the request.

The server should be set to return the content types for different files as follows:

text/sgml

for SGML files

text/xml

for XML files, XML TOC files, XSL stylesheets, RDF files (the default annotation format) etc. Note that the content type text/xsl does not exist, XSL is the semantic file type but the primary type is still text/xml which is needed for DocZilla to know that the file should be fed to its XML parser.

text/css

for CSS files. Some HTTP servers send CSS files as text/plain which is reserved for a generic text file.

TOC open-to-depth parameter

The default value for this parameter comes initially from the user preferences. That value will be overridden by the value of an open-to-depth attribute in the toc element of the TOC file. The resulting value will be ultimately overriden by a open-to-depth value defined in the TOC processing instruction or in the equivalent ENTITYRC entry, if either is present.

A notable exception is that the user is able to set the priority of the open-to-depth user preference to the highest possible, thus overriding both PI/ENTITYRC value and the attribute value in TOC.

Advanced image decoder

The DocZilla advanced image decoder provides support for bitmap image decoding modules. Currently the libtiff TIFF library is implemented as one, both because TIFF is a widely used format in technical documentation and also as a proof-of-concept implementation. We also have a test implementation that utilizes a generic commercial image decoder library that supports dozens of formats and made them all available in DocZilla. For example, that library — or any other for that matter — can be implemented in DocZilla per customer request.

NDATA handling in DocZilla

The way how DocZilla handles non-SGML data in external unparsed entities is configurable and comes with reasonable defaults that can be changed by the user. The rules in the document environment override the default values.

DocZilla supports a few internal system identifiers for notations. If an entity has the NDATA set to a notation name, then one of the following possibilities will occur based on the system identifier of the notation:

For doczilla:image:

Tries to display the entity as an image: if the image format is not supported or the entity is not an image, nothing will be displayed.

For doczilla:embed:

Tries to display the entity in a frame that is embed to the document. The frame works similarly to the view area in the DocZilla Browser's window and should be able to display any file that DocZilla itself is able display.

For doczilla:plugin:

DocZilla will search for a plugin library that would be capable of displaying the entity. If one is found, the plugin will show the entity file inside the element.

For doczilla:autolink:

DocZilla will generate a little icon to represent the entity. Clicking on the icon will trigger a hyperlink to the entity file which will then replace the current document, provided that DocZilla is capable of displaying the entity in the first place.

For explicit doczilla:ignore:

DocZilla will silently ignore all occurrences of this entity when referred in entity attributes.

For doczilla:default or "" (empty string or unset value):

DocZilla will utilize the notation name—system identifier mappings in the preferences file to determine the correct system identifier. If none is found, DocZilla will fall back to unknown system identifier, which is described in the next entry.

Anything else

will be considered unknown, and a proper explanation content is automatically generated in the document. The content will also act as a hyperlink to the entity in question. This behaviour may be suppressed to silent ignorance by setting the appropriate preference item in the Preferences window.

It's possible to set the dimensions of an image, plugin or an embed frame. To do this, define notation attributes width and/or height, and set their default value to the desired size of the displayed NDATA entity. You can override those values with setting the attribute values for one entity in its declaration. The values can be written in pixels or percents.

Example 2.13.  Example: To set the dimensions of displayed NDATA entities

<!NOTATION jpeg SYSTEM "doczilla:default">
<!ATTLIST #notation jpeg
    width CDATA "100%"
    height CDATA #IMPLIED
>
<!ENTITY first SYSTEM "first.jpg">
<!ENTITY foobar SYSTEM "image.jpg" [ width="300" height="300" ]>

This code will display the image first.jpg page-wide, depending on the width of the window. (The width is always 100% of the available space and the height will adjust according to the aspect ratio as it's not fixed.) The second image will be fixed at the size of 300 per 300 pixels.

Configuring NDATA handling as a user

The DocZilla preferences contains mappings of notation names to the special system identifiers. See appropriate preferences panel.

Configuring NDATA handling as the author

The above was a user-side configuration. You can of course set the system identifier in a markup declaration, too. Consider that you want to display some smaller TIFF images as usual but link to the huge ones, you can use a new notation for huge TIFFs:

Example 2.14.  NDATA authoring in DTD

<!NOTATION hugetiff SYSTEM "doczilla:autolink">
<!NOTATION tiff SYSTEM "doczilla:default">
<!ENTITY mapofworld SYSTEM "huge/map.tiff" NDATA hugetiff>

Then, DocZilla will use the user's preference for NDATA "tiff" which defaults to doczilla:image and display icons for those entities that were declared to be of the type hugetiff.

Note that the system identifier of the notation may well be referenced using a public ID which would then be resolved using catalog files. In fact, this is the recommended way. Using system IDs directly in notation declarations reduces flexibility.

CGM hotspots

The SDI CGM Reader plugin supports dynamic hotspots with DocZilla. Besides viewing CGM images with existing hotspots defined in the image data, it's possible to dynamically point to a hotspot region inside the CGM image.

The documentation for this feature is currently not written yet, as of 2.7pre1.



[5] Note, that references to external documents are not yet supported.

[6] Note that when creating a new annotation from scratch, simply saving the changes won't dynamically update the target document, if it's open in any browser view. A reload of the document is needed to display the annotation button on the page.

[7] Note, this limitation only affects the number of top-level directories. If you need to make a relative index of two directories, simply create one extra directory, move the two directories in there and index that one directory. It doesn't make sense to create an index relative to several directories without a common parent

[8] As an example of flexibility of the Mozilla's XPConnect framework, there is e.g. a third-party project to enable XPConnect bindings for the Python language. (The developers are not affiliated with Citec and the bindings are not, to our knowledge, production-ready.)

Appendix A.  DocZilla TOC DTD

The DocZilla TOC DTD consists of three main elements, toc, tocitem and link. The markup for tocitem targets is a subset of the markup specified in the XLink standard[9]. The attribute xml:base is supported for TOC files.

<!ELEMENT toc (tocitem*)>
<!ATTLIST toc
          title          CDATA   #IMPLIED
          xmlns:xlink    CDATA   #FIXED  "http://www.w3.org/1999/xlink"
          <!-- A number greater-than-or-equal to 1, denoting the primary --
            -- column of the TOC tree -->
          primary-column NUMBER  #IMPLIED
          <!-- Space-separated width specifications for each column. Each --
            -- specification consists of "<number><unit><fixed>", where   --
            -- "number" is the integer width, "unit" is                   --
            -- {"px","em","ex","pt","*"} and "fixed" is either "!" or ""  --
            -- (empty string). The unit "*" denotes a relative portion of --
            -- the available width, in CALS style. The default width for  --
            -- each column is "1*". A special width "*" denotes the       --
            -- smallest possible width for the column, which usually is   --
            -- the width of the column title. A column with a fixed width --
            -- is not resizable by the user.                              -->
          column-widths  CDATA   #IMPLIED
>

<!ELEMENT tocitem    (link+, tocitem*)>

<!ELEMENT link    ((title|titleref),target*)>
<!ATTLIST link
          xlink:type     (simple)                    #FIXED  "simple"
          xlink:href     CDATA                       #IMPLIED
          xlink:actuate  (onRequest)                 #FIXED  "onRequest"
          xlink:title    CDATA                       #IMPLIED
          xlink:show     (new|replace|embed|ignored) #FIXED "ignored"
          target         CDATA                       #IMPLIED
          subtoc         CDATA                       #IMPLIED
          subtocposition (first|last)                #IMPLIED
>

<!ELEMENT title (#PCDATA)>

<!ELEMENT titleref EMPTY>
<!ATTLIST titleref
          xlink:type    (simple) #FIXED "simple"
          xlink:show    (embed)  #FIXED "embed"
          xlink:actuate (onLoad) #FIXED "onLoad"
          xlink:href    CDATA    #REQUIRED
>

<!ELEMENT target (title|titleref)
          xlink:type     (simple)                    #FIXED  "simple"
          xlink:href     CDATA                       #IMPLIED
          xlink:actuate  (onRequest)                 #FIXED  "onRequest"
          xlink:title    CDATA                       #IMPLIED
          xlink:show     (new|replace|embed|ignored) #FIXED "ignored"
          target         CDATA                       #IMPLIED
          for            (icon|text|any)             "any"
>


[9] As a technical note: It was one of the early ideas in development of DocZilla to accept any XML document containing XLinks as the TOC file. This feature won't be implemented until an implementation of modular TOC loader backends is written.

Appendix B.  User Interface Reference

Table B.1.  Key to menus

Menu item Description
File > Annotation export wizard Export annotations found in the current document to a file
Edit > Set link start... Used in creating a user hyperlink: and set the current selection as the start point of the link
Edit > Add link end... Used in creating a user hyperlink: add the current selection to the end points of the link
Edit > Delete links... Manipulate user hyperlink database: allows deletion of links that point to/from the current document
View > Annotations View the annotation editor
View > Message logger View the DocZilla warning/error message log
View > Extended XLinks View and manipulate properties of memory-resident XLinks
View > Toggle TOC panel Shows and hides the Table of Contents pane on the left side of the browser view
Tools > DocZilla Search View DocZilla Search window to search in SGML/XML files on disk or manage SGML indices
Tools > Web Development > CSS Checker Run the elementary CSS checker on the current document
Help > DocZilla Reference Manual Display this document

Appendix C.  SGML and XML declarations

When loading SGML documents, if no SGML declaration is defined in the catalogs via the SGMLDECL keyword then a default declaration similar to in OpenSP is in effect.

When loading an XML document, the following SGML declaration is automatically applied before any declarations in the catalogs. This is because XML is well defined in terms of an SGML declaration and needs not be changed.