![]() |
Majix Light 1.1 |
![]() |
|
||
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Majix input formatMajix converts the RTF file into an intermediate format, and then converts this format into XML. The structure of this intermediate format is predetermined in Majix, while Majix Pro will allow the user to extend or adapt it. We describe below the Word formatting instruction used by Majix by default, and their mapping into Majix intermediate format. The conversion into this intermediate format is driven by:
It is possible to extend the names of the styles Majix can process by modifying input format. It is possible to customize XML tags generated by Majix by associating XML tags to intermediate format: see modifying XML tag names. Information blockWord attaches a set of properties to the document (you can view and modify these properties with the File->Properties menu) such as a title, subject, author, manager, company, etc.. By default, Majix extracts these informations and constructs an <info> element in the XML file, just after the first tag of the generated document. This <info> element contains the following sub-elements:
If you don't want the <info> element to appear in the XML file, you can disable this functionality by clicking on the "Edit tag" button in the main Majix window. Then, choose "Include info block" in the "info element" section of the list. You will see a check box to enable or not the info block. SectionsA document normally starts with a Document title, identified by default by the style Title;
It contains the various text body elements described below and at most six levels of sections, named as follows:
All the sections start with a section title:
Word does not have the concept of section, but only of heading. Therefore, a paragraph of style Heading n, where n goes from 1 to 6, will be translated into a section title at the top of a new section of level n. A section of level n will include sections of lower levels (from n+1 to 6) and will be included in sections of higher level (from 1 to n-1). When encountering another heading of the same level, the section will be terminated and a new one started. For example, a paragraph with "Heading 1" style containing the words "Majix input format" following with a normal paragraph containing the text "Majix input format is ." will be converted in XML like that: <H1><HT>Majix input format</HT> <P>Majix input format is </P> </H1> Body of textThe body of text contains ordinary paragraphs:
Styles such as Normal, Body Text, etc. will normally be translated into paragraphs. ListsIt can contain simple, unordered (bulleted) and ordered (numbered) lists with at most three levels of nesting. The lists are composed of list items that themselves contain paragraphs or nested lists:
List item continuation are described in Word using the styles List Continue, List Continue 2 and List Continue 3. They will be represented in XML by other paragraphs included in the list item element. Styles in Word are normally represented by styles such as List, List Bullet and List Numbered. Word represents list nesting by using different styles (such as List Bullet 2 and List Bullet 3). Majix generates instead XML recursive structures (a list item can contain another list). List items cannot directly contain character data, but contain instead paragraphs that themselves contain character data: <li> <it><p>first item</p></it> <it><p>second item</p></it> </li> Definition listsItems of Definition lists are composed of a term and its definition:
While this structure is rather common, there is no predefined Word style to define it. Majix uses by default the style Definition List to represent that structure, and uses a tabulation to separate the defined term from the definition. User defined stylesUser defined styles may be map to abstract structures named block type 1 to block type 9. Theses structures are user-defined paragraph-like elements. The user will map theses abstract structures to its own XML elements in the tag editor. The complete list of user defined styles is:
Note: as they are expected to be mapped to user-defined tags, these block elements are not defined in the provided DTD. The "block type" intermediate format is to be used with user-defined paragraph styles. User-defined character styles may be mapped to inline text elements. Table elementsTables are composed of rows, themselves composed of cells:
XML Table are produced from Word tables. In Majix, only regular tables are supported (that is, where each row has the same number of cells). Merged cells are not supported in Majix. In-line text elementsThe concept of "In-line elements" corresponds to character properties and character styles in Word. Character propertiesThe following character properties are by default transalated into XML elements:
Majix predefined characters stylesThe character styles Emphasis and Strong are predefined in Word. The following character styles are predefined in Majix. The correspondence between Word style, intermediate Majix format and XML tag name is:
User defined character stylesYou can define your own character styles in Word. The "inline element" intermediate format is provided to map your own character styles to XML elements.
Note: as they are expected to be mapped to user-defined tags, these block elements are not define in the provided DTD. The "inline element" intermediate format is to be used with user-defined character styles. User-defined paragraph styles may be mapped to block type elements.
ColorsEach of the sixteen colours supported by Word can be treated by Majix. By default, they are converted in XML by the <c> element with the attribute "color". You can change the name of the XML tags (see modifying XML tag names.). The list of supported colors is:
Note: the colour names used are the standard HTML names; the names used by Word are sometimes different. PicturesIn Word, pictures can be embedded or linked (Insert->Picture->From file with check box "Link to file" checked). When converting a linked picture, Majix generates by default a <graphic> element with a url attribute containing the file name of the picture. When converting an embedded picture, Majix also extracts the picture data and produces a WMF (Windows Metafile File) file with the picture data. The name of the picture file is built by adding a numeric suffix to the name of the generated XML file, and prefixing the name by a customizable directory name (the default directory is images). For instance, let us assume that we convert a file named myreport.doc into XML, using the default output name: myreport.xml. If mydefault.doc contains embedded pictures, they will be extracted in files images\myreport-001.wmf, images\myreport_002.wmf, etc. Note: WMF is not a common format for raster images on the Web. You are therefore encouraged to use linked images with a more common format such as GIF or JPEG. The customization of the graphic element allow to generate an attribute with the filename of the graphic, or with its entity name (or both). Just specify an attribute name for the type of attribute that interests you. |
Copyright TetraSix, 1999 - info@tetrasix.com |