DASHES XTENSION INTRODUCTION

What the Dashes XTension is For

• The Dashes XTension™ provides high-quality Hyphenation for your QuarkXPress® text. It's aim and purpose is to provide a tool that improves the appearance and readability of a user's page by inserting hyphens which are inconspicuous, so as not to disturb the flow of meaning in the story.

• The internal hyphenation code that is used in this XTension™ has been licensed from Circle Noetic Services, thus you are guaranteed to receive the same high standard of hyphen breaks, as is provided by their renowned Dashes DA™.

• Allows you to insert discretionary hyphens into a selection of text, the current text box, or the complete document.

• When inserting hyphens, the Auto Hyphenation Rules that you specify from within QuarkXPress (Edit > H&Js) are fully obeyed.

• Allows you to remove discretionary hyphens from a selection of text, the current text box, or the complete document.

• If you are unhappy with the way the XTension™ hyphenates a word, you can enter it into your own Dashes exception dictionary.

• Each Dashes hyphen has a stylistic ranking associated with it, this allows you to exclude hyphens that are not up to the stylistic quality you specify.

• Allows you to import and export hyphenation exception lists into and out of the XTension.

• Will correctly hyphenate words that contain ligatures

• All the hyphens inserted are stored as discretionary hyphens within the document, thus the document will not re-flow due to hyphenation exceptions, when opened or printed on another machine.

• Will correctly hyphenate the document, regardless of the System or Program language.

• Dashes XTension™ is a linguistically based algorithm. It puts in over 99% of the possible hyphens with over 99% accuracy in all the languages offered.

General Features

ABOUT HYPHENATION

People have intuitions about where it makes sense to break a word, and where it doesn't. The principles behind these intuitions can be defined to a large extent, and dictionaries base their hyphenation suggestions on them. The hyphenations suggested by different dictionaries vary. The variations depend to some extent on the dialect of the dictionary, but most of the variations are due to ambiguities in our intuitions about where to break a word. We can see this in the fact that all American English dictionaries hyphenate idiosyncratically, as do all British English dictionaries.

Principles of Word Breaking

There are two principles of word breaking that no dictionary in any language that we know of violates. That is:

A. In a compound word, break the compound between the component words. In other words breaks like 'an-teater' are universally unacceptable.

B. The unit occurring between two hyphens must be a possible syllable in that language. For example, breaks like:
co-mpatibl-e
whi-tehouse
overdr-essed
are also universally unacceptable.

In many languages, the break does not occur on the actual pronounced syllable boundary. For example, if you were to analyze most people's pronunciation of the word 'anteater', we would find that most people pronounce the syllables as 'an/tea/ter'. However the hyphenation accepted by most dictionaries, namely 'ant-eat-er' still conforms to principle (B), because 'ant', 'eat' and 'er' are all legal pronounceable syllables in English.

In the example above, we can see that there are two forces affecting our intuitions about where to break a word. One is that we wish to break words on syllable boundaries. The other is that we want to hyphenate words between what linguists call morphemes. Morphemes are meaning-bearing elements of the word, such as each of the component words of a compound, prefixes, suffixes and roots. A conflict arises when the morpheme boundary is not the same as the syllable boundary. This conflict is solved differently in different languages and by different dictionaries within the same language. This conflict is the main cause that accounts for the discrepancies among and within dictionaries.

Let us then add a third principle which is applied in no language that we know of, but which applies sometimes in most dictionaries for most languages.

C. Break words on prefix and suffix boundaries.

Portuguese is an exception to this. In Portuguese, you never hyphenate on prefix or suffix boundaries unless they coincide with syllable boundaries. The English solution to this conflict of syllable and morpheme boundaries is unnecessarily complex, but certain rules have been established by convention, and must be followed for English.

The last universal rule is:
D. Avoid stylistically poor hyphens like: 1. Hyphens too close to a word boundary, such as: a-mygdaloid, e-vocative, an-tidisestablishmentarianism

2. Hyphens that fall in the middle of a prefix or suffix: pol-yethylene, Armeni-an

Types of Hyphenation Programs

There are two basic approaches to hyphenation. One is dictionary-based, and the other is algorithm- based. Any algorithm can be made to match any dictionary by adding an exception dictionary to the algorithm which contains all the words that the dictionary and the algorithm hyphenate differently. Thus the real difference between algorithm- and dictionary-based hyphenation lies in what they do with words that do not appear in the dictionary. A dictionary-based program does not hyphenate any words that do not appear in the dictionary. The advantage of this is that you are guaranteed no wrong or unreasonable hyphenations. The disadvantage is that even in English, a relatively large percentage of the words that actually appear in text are not in the dictionary.

ABOUT HYPHENATION

In languages such as German, Dutch, and the Scandinavian languages, this is even more crucial, because of word compounding. The compounds in these languages are the longest words, and therefore those one would most like to hyphenate. But since they are freely generated, no dictionary can contain them all.

The two types of algorithms available are what we call statistically and linguistically based algorithms. Statistically based algorithms treat hyphenation purely as an information/dictionary compaction problem. The intention is to use various methods to create the most compact data representation that will enable the reproduction of hyphens present in the original list. Linguistically based algorithms are those which make extensive use of information we have about how words and syllables are structured. Both methods have advantages, however we don't know of a statistically based algorithm which is capable of hyphenating Germanic languages with a high degree of accuracy.

It is trivial to write an algorithm that only finds 40%-50% of the possible hyphens in text accurately, and that may be all you need. Such algorithms are available at essentially no cost, and they work very fast. However, they tend to put in hyphens at the beginnings and ends of words, and not in the middle, where they are most useful. It is easy to imagine how one might write an algorithm that puts in 1% of the possible hyphens with 100% accuracy. You tell the computer when it finds 'ment', 'ments', 'tion', or 'tions' at the end of a word to put a hyphen in front of it. However it gets exponentially more difficult to maintain a high degree of accuracy as you put in a higher percentage of hyphens. In fact it is impossible for either dictionaries or algorithms to put in 100% of the possible hyphens correctly, because there are a number of words that are spelled the same, but hyphenated differently, such as 'rec-ord' and 're-cord'.

Dashes XTension™ is a linguistically based algorithm. It puts in over 99% of the possible hyphens with over 99% accuracy in all of the languages offered.

We recommend you make a back up of your original XTension disk, and keep the original in a safe place.

Like all CompuSense XTensions, Dashes is easy to install and use: Simply place the XTension file into the QuarkXPress program folder and launch QuarkXPress. New menus, commands and dialog boxes are seamlessly integrated into the program. If for some reason, you wish to remove the XTension's functionality from XPress, just move the XTension file to a folder other that the one containing the XPress program.

Each CompuSense XTension that is sold, has been configured to run with an individual copy of QuarkXPress® (i.e tied to the serial number). If the XTension is launched with a copy of XPress having a serial number other than this, the XTension's functionality will not be available.

The main menu contains seven items related to Dashes XTension.

It is very important to note that the Dashes XTension™ is tightly integrated with XPress. It obeys any Auto Hyphenation rules that you specify and apply.

The Auto Hyphenation area enables you to specify the way in which Dashes hyphenation is applied. You can specify different hyphenation rules for each H&J specification that you create. You can specify the number of characters a word must contain for it to be hyphenated, how many characters must fall before and after a hyphen, whether capitalized words are hyphenated, and how many lines of text can end with a hyphenated word.

The configuration menu item presents the following dialog:

Each Dashes hyphen has a number associated with it which represents it's stylistic quality. Rankings are defined (best to worst) as follows:

0 word boundary hyphen as in: white-house.

1 prefix or suffix boundary hyphens as in: under-estimate
relation-ship

2 hyphen in the middle of a root as in: estab-lishment
mas-ter

3 hyphen in the middle of a prefix or suffix as in: hy-peractive
un-derestimate

4 hyphen one letter from a word or affix boundary as in: paraphenali-a, o-verreact. turno-ver, mon-onucleosis

You can exclude hyphens with a ranking higher than the number you specify. If you want no level 4 hyphens to appear in your text, type 3 after 'Maximum Ranking allowed' in the configuration … dialog box.

In the Configuration… dialog box , there is a check box titled Rehyphenate If this check box is checked on, the XTension™ will rehyphenate words that already contain discretionary hyphens. Otherwise Dashes will not rehyphenate words already hyphenated.

The CopyFlow check box specifies whether Dashes is to work in conjunction the CopyFlow XTension from North Atlantic Publishing Systems or not. If you do not have the CopyFlow XTension then turn this option off. If it is turned on and CopyFlow is loaded, then all text that is imported into QuarkXPress using CopyFlow will be automatically hyphenated.

Dashes works by inserting soft or discretionary hyphens into words. If it finds a word that should not be broken, then it normally inserts a soft hyphen at the start of the word to prevent QuarkXPress's internal algorithm from possibly breaking the word. This method can lead to one undesired side effect. If the word that is not to be broken has a punctuation or bracket character immediately preceding it, then the soft hyphen is inserted between the preceding character and the word itself. This can occasionally cause highly undesirable breaks such as '(-'.

FUNCTIONS

To prevent this from ever happening, you can turn off the Hyphen at Start of Word option.

The Words with hard hyphen check box allows you to specify whether words that already contain 'Hard' hyphens should have discretionary hyphens inserted into them or not during the hyphenation process.

Hyphens can be inserted into a word, a selection of text, into the current story text , or into all of the text within a document.

If Word is chosen, a soft or discretionary hyphen will be inserted into a word at the currently selected insertion point, while also removing any soft hyphens that may already be in the word. This ensures that the word will only break at the selected point.

If Selection… is chosen, hyphens will be inserted into the highlighted text in the current text story.

If Story… is chosen, hyphens will be inserted into all of the text in the current story according to the Auto Hyphenation settings. Because H&J is a paragraph attribute, you can apply an H&J specification with Auto Hyphenation enabled to some paragraphs, and another H&J specification with Auto Hyphenation disabled to other paragraphs with different hyphenation needs.

If Document… is chosen, hyphens will be inserted according to the Auto Hyphenation settings into all of the text in the document.

If you wish to remove all the hyphens in a selection of text, a text story, or the complete document, you can use the Remove Hyphens… submenu.

To display suggested breaks for a word, place the Text insertion cursor within the word and choose Suggested Hyphenation… from the XTension™ menu.

This is useful as a pre-check before you enter a word in the Hyphenation Exceptions dictionary, or if you want to manually hyphenate a word in a paragraph that is formatted with Auto Hyphenation turned off.

The Discretionary Hyphenation dialog box shows the hyphenated word.

If you are unhappy with the way the Dashes hyphenates a word, you can enter it in your own exception dictionary.

You can use the exception dictionary either to change the hyphens that the XTension™ assigns, or to keep it from hyphenating a word altogether. You may want to do this with certain proper names, for example.

To enter a word in the exception dictionary:

1. Pull up the Edit Hyphenation Exceptions dialog box by selecting it from the menu.

2. Type the word you are entering into the box with hard hyphens inserted. For example: bly-thop-erp-us

If you wish to prevent a word from being hyphenated, enter it with no hyphens, For example:

3. Click on New to insert the word.

If you make a mistake while entering a word into the exception dictionary, you can remove it by clicking on the word in the exceptions list and then clicking on the Delete button.

Once you have completed editing your exceptions list click the Exit button.

The XTension™ has the capability of exporting the Exceptions dictionary to a text file.If you select Export Exceptions… from the main menu, the export dialog will be displayed. Use the controls in the dialog box to name the text file that will be created.

Note. This option is not available in the Windows version of the XTension as the ‘Exception dictionary’ is stored in a text file on this platform.

If you already have a list of words which you would like hyphenated in a particular way, and you would like to enter these into the exception dictionary all at once without checking whether the XTension™ hyphenates each one in the same way, the ‘Import Exceptions’ option can be used.

The text file to be imported must have the following format:

1). Each word must be followed by a carriage return.

2). The hyphens must appear as hard hyphens in the text.

Note! The import function enters words into the exception dictionary if and only if Dashes hyphenates the word differently from what is specified in the import list.

Available Languages

Croatian Czech Danish

Dutch English Finnish

French (European and Canadian)

German Greek Hebrew

Hungarian Icelandic Italian

Norwegian (Bokmål and Nynorsk)

Polish Portuguese Russian

Spanish Slovenian Swahili

Swedish Turkish

Note. We have encountered a number of differences of opinion among typographers and other specialists concerning the hyphenation of French. People differ primarily on whether to hyphenate words such as 'instruire' as 'in-struire' or 'ins-truire'. The Canadians in general prefer the latter, but those in France seem to be divided, and feel very strongly one way or the other. For this reason we have two versions of French, which differ along these lines.

Some examples:

French1 French2

in-strui-re ins-trui-re cons-trui-re cons-trui-re conta-mi-ner can-ta-mi-ner trans-ac-tion tran-sac-tion

QuarkXPress® and XTension™ are trademarks of Quark Inc. Macintosh™ is a trademark of Apple Computer Corporation.

Dashes™, Dashes DA™ are trademarks of Circle Noetic Services.

QUARK, INC. MAKES NO WARRANTIES, EITHER EXPRESS OR IMPLIED, REGARDING THE ENCLOSED COMPUTER SOFTWARE PACKAGE, ITS MERCHANTABILITY, OR ITS FITNESS FOR ANY PARTICULAR PURPOSE. QUARK, INC. DISCLAIMS ALL WARRANTIES INCLUDING, BUT NOT LIMITED TO THE WARRANTIES OF THE DISTRIBUTORS, RETAILERS AND DEVELOPERS OF THE ENCLOSED SOFTWARE.

WITHOUT LIMITING THE FOREGOING, IN NO EVENT SHALL QUARK, INC. BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES IN ANYWAY RELATING TO THE USE OR ARISING OUT OF THE USE OF THE ENCLOSED SOFTWARE. QUARK, INC'S LIABILITY SHALL IN NO EVENT EXCEED THE TOTAL AMOUNT OF THE PURCHASE PRICE/LICENSE FEE ACTUALLY PAID FOR THE USE OF THE ENCLOSED SOFTWARE.

SOME STATES DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES AND/OR THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THESE EXCLUSIONS AND LIMITATIONS MAY NOT APPLY IN PARTICULAR COUNTRIES.

The Dashes™ Hyphenation XTension™ was developed by CompuSense Ltd.

The Internal Hyphenation Dictionaries and algorithms have been licensed from Circle Noetic Services, many thanks to Sasha and Margaret for their help.

The ideas for the features in this product are a result of feedback from many QuarkXPress® users around the world, (Too numerous to mention). We continue to welcome suggestions on how the functionality can be further improved and enhanced.

This manual and the Dashes™ XTension™ software are copyrighted, with all rights reserved. Under the copyright laws, this manual and software may not be copied or reverse engineered, in whole or in part, without the written consent of CompuSense Ltd.

Published by: CompuSense Ltd.