fontwrap a Perl(Xe(La))TeX package

THIS PACKAGE IS OBSOLETE AND HAS BEEN REPLACED BY THE NATIVE-XeTeX UCHARCLASSES PACKAGE.

This is a rudimentary (well, of sorts) page for the fontwrap LaTeX package. Or, more precise, for the PerlTeX package that can be processed using XeLaTeX, which in effect uses the LaTeX flavour as TeX as basis for XeTeX, which is the unicode version of TeX. Confused yet? As long as you use "Tex Live" or "MikTeX", you will be able to use this package pretty much automatically. If not, you may need to download the dependency packages manually.

In case you found this page through some search engine, it's been written by me, Michiel Kamermans, and my homepage is at pomax.nihongoresources.com, which is basically just a subdomain on my main website for resources for Japanese, Nihongoresources (you can contact me through the "contact" link on that website, too).

What does it do?

fontwrap lets you type multilingual text (in UTF8) without having to worry about which font tags to insert where. Instead, you tell fontwrap which fonts to use for which unicode blocks, and wrap your text in a simple command:

\fontwrap{}

It will add a font tag between two characters if one is from one unicode block, and the next is from another unicode block. For instance, say you're writing a mixed language document in English and Hebrew. The font you like best for English has no Hebrew in it, and the font you like best for Hebrew simply looks crap for English text. The solution: you tell fontwrap that you want font X for the 'Latin' unicode blocks, and font Y for the Hebrew unicode block. Finally, you tell it which macros it is allowed to go into to add font tags (useful to allow things like \emph and \mbox etc.)

Basic commands

\fontwrap{
  Your text goes here
}

Wraps around text, tells fontwrap to process that text.

Normally, (xe)(la)tex will strip whitespace from arguments passed to marcos because it doesn't use them for spacing purposes. However, sometimes that matters a lot, like when you're running fontwrap for a mix or normal text and verbatim environments. In these cases, you can wrap the relevant text in fontwrap's flavour of the verbatim environment:

\begin{verbatimfontwrap}
...
\end{verbatimfontwrap}

Basically this wraps the text in in a new group, within which \obeyspaces and \obeylines have been triggered, forcing (xe)(la)tex to keep them in. You still need to issue the fontwrap command though:

\begin{verbatimfontwrap}
\fontwrap{
  Your text goes here
}
\end{verbatimfontwrap}

You will almost never need this environment, but if you run into "Overfull \hbox" warnings, good chance that either fontwrap generated bad code (in which case, contact me!) or spacing mattered and you forgot to wrap the relevant text in the verbatimfontwrap environment (try that first. if it still fails, contact me and I can have a lovely quick debugging session)

\setfontwrapallowedmacros{macro1,
                           ...
                          macron}

By default fontwrap doesn't process text inside macros, meaning that when it encounters macros it has not been told it may analyse the text for, it won't. In order to tell it which macros it may parse the inner text of, you have to pass all macronames as a comma delimited list. For example, in the fontwrap_example.tex file the list is just 'ruby'. With it set to just ruby, fontwrap doesn't touch text inside \emph macros, even though normally you would like it to. Changing the list from 'ruby' to 'ruby,emph' will make it do so.

\setfontwrapallowedenvironments{environment1,
                                    ...
                                environmentn}

In addition to macros, fontwrap needs to be instructed which environments it should be allowed to work in. For instance, it's typically a bad idea to start adding font tags inside a {verbatim} environment, so again you can pass it a comma delimited list of environments that fontwrap is allowed to process (obviously, verbatim usually not be in that list, unless you wanted to see the effects of fontwrap of course...)

\setunicodeblockfont{Block name}{Font name}

This command tells fontwrap to use a particular font for a particular unicode block. The name of your font should be the name you see in word processors or font properties, not the filename, so good fontnames are "Ume Mincho", "Times New Roman" and "Palatino Linotype", and not "MyriadPro-It.otf", or "pala.ttf". As for which unicode block to change, there are quite a few blocks to pick:

  1. AegeanNumbers
  2. AlphabeticPresentationForms
  3. AncientGreekMusicalNotation
  4. AncientGreekNumbers
  5. Arabic
  6. ArabicPresentationFormsA
  7. ArabicPresentationFormsB
  8. ArabicSupplement
  9. Armenian
  10. Arrows
  11. Balinese
  12. BasicLatin
  13. Bengali
  14. BlockElements
  15. Bopomofo
  16. BopomofoExtended
  17. BoxDrawing
  18. BraillePatterns
  19. Buginese
  20. Buhid
  21. ByzantineMusicalSymbols
  22. Cherokee
  23. CJKCompatibility
  24. CJKCompatibilityForms
  25. CJKCompatibilityIdeographs
  26. CJKCompatibilityIdeographsSupplement
  27. CJKRadicalsSupplement
  28. CJKStrokes
  29. CJKSymbolsandPunctuation
  30. CJKUnifiedIdeographs
  31. CJKUnifiedIdeographsExtensionA
  32. CJKUnifiedIdeographsExtensionB
  33. CombiningDiacriticalMarks
  34. CombiningDiacriticalMarksforSymbols
  35. CombiningDiacriticalMarksSupplement
  36. CombiningHalfMarks
  37. ControlPictures
  38. Coptic
  39. CountingRodNumerals
  40. Cuneiform
  41. CuneiformNumbersandPunctuation
  42. CurrencySymbols
  43. CypriotSyllabary
  44. Cyrillic
  45. CyrillicExtendedA
  46. CyrillicExtendedB
  47. CyrillicSupplement
  48. Deseret
  49. Devanagari
  50. Dingbats
  51. DominoTiles
  52. EnclosedAlphanumerics
  53. EnclosedCJKLettersandMonths
  54. Ethiopic
  55. EthiopicExtended
  56. EthiopicSupplement
  57. GeneralPunctuation
  58. GeometricShapes
  59. Georgian
  60. GeorgianSupplement
  61. Glagolitic
  62. Gothic
  63. GreekandCoptic
  64. GreekExtended
  65. Gujarati
  66. Gurmukhi
  67. HalfwidthandFullwidthForms
  68. HangulCompatibilityJamo
  69. HangulJamo
  70. HangulSyllables
  71. Hanunoo
  72. Hebrew
  73. HighPrivateUseSurrogates
  74. HighSurrogates
  75. Hiragana
  76. IdeographicDescriptionCharacters
  77. IPAExtensions
  78. Kanbun
  79. KangxiRadicals
  80. Kannada
  81. Katakana
  82. KatakanaPhoneticExtensions
  83. Kharoshthi
  84. Khmer
  85. KhmerSymbols
  86. Lao
  87. LatinExtendedAdditional
  88. LatinExtendedA
  89. LatinExtendedB
  90. LatinExtendedC
  91. LatinExtendedD
  92. LatinSupplement
  93. LetterlikeSymbols
  94. Limbu
  95. LinearBIdeograms
  96. LinearBSyllabary
  97. LowSurrogates
  98. MahjongTiles
  99. Malayalam
  100. MathematicalAlphanumericSymbols
  101. MathematicalOperators
  102. MiscellaneousMathematicalSymbolsA
  103. MiscellaneousMathematicalSymbolsB
  104. MiscellaneousSymbols
  105. MiscellaneousSymbolsandArrows
  106. MiscellaneousTechnical
  107. ModifierToneLetters
  108. Mongolian
  109. MusicalSymbols
  110. Myanmar
  111. NewTaiLue
  112. NKo
  113. NumberForms
  114. Ogham
  115. OldItalic
  116. OldPersian
  117. OpticalCharacterRecognition
  118. Oriya
  119. Osmanya
  120. PhagsPa
  121. Phoenician
  122. PhoneticExtensions
  123. PhoneticExtensionsSupplement
  124. PrivateUseArea
  125. Runic
  126. Shavian
  127. Sinhala
  128. SmallFormVariants
  129. SpacingModifierLetters
  130. Specials
  131. SuperscriptsandSubscripts
  132. SupplementalArrowsA
  133. SupplementalArrowsB
  134. SupplementalMathematicalOperators
  135. SupplementalPunctuation
  136. SupplementaryPrivateUseAreaA
  137. SupplementaryPrivateUseAreaB
  138. SylotiNagri
  139. Syriac
  140. Tagalog
  141. Tagbanwa
  142. Tags
  143. TaiLe
  144. TaiXuanJingSymbols
  145. Tamil
  146. Telugu
  147. Thaana
  148. Thai
  149. Tibetan
  150. Tifinagh
  151. Ugaritic
  152. UnifiedCanadianAboriginalSyllabics
  153. VariationSelectors
  154. VariationSelectorsSupplement
  155. VerticalForms
  156. YiRadicals
  157. YiSyllables
  158. YijingHexagramSymbols

This is a cumbersome way to go about changing fonts, mostly because typically it means you need to change too many blocks. As the intention of this package is to have a convenient set-and-forget system, there is a more convenient setof commands for font binding, based on informal unicode groups.

\setunicodegroupfont{Group name}{Font name}

This tells fontwrap to bind a particular font to a particular group of unicode blocks. For instance, if you want to integrally change the font for all the Latin blocks, then you'd use \setunicodegroupLatinFont{Best Latin Font} and fontwrap will change the font for all the separate Latin blocks in the previous list. Available informal groups are:

  1. Arabic
  2. Chinese
  3. CJK (combines all Chinese, Japanese and Korean blocks)
  4. Cyrillic
  5. Diacritics
  6. Greek (including some Coptic)
  7. Korean (individual blocks called 'Hangul')
  8. Japanese
  9. Latin
  10. Mathematics
  11. Phonetics
  12. Punctuation
  13. Symbols
  14. Yi
  15. Other - a lump group, which I hope to have unlumped entirely eventually

I'm not a fan of lumpgroups, but I wanted to get this package off the ground in a usable way first. I shall progressively un-lump the "Other" group, and probably add in a language command too, so you can use something like \setLanguage(yourfonthere) without having to explicitly rely on the unicodeblocks (of course, behind the screens language would still be linked to unicode blocks, but the less you need to know, the more useful a package becomes, right?)

\setfontwrapdefaultfont{fontname}

This tells fontwrap to bind this particular font to every and all unicode blocks. Until you change the font for some unicode block, this means that fontwrap will only slap a fontcode for this font at the start of the block you're wrapping, and then do absolutely nothing because it'll all use the same font anyway.

How to run

Using perltex instead of (Xe)LaTeX is fairly straightforward, because it still uses (Xe)LaTeX. I rely on XeLaTeX because of fontspec so I use:

perltex --latex=xelatex mytexfile.tex

In this, the --latex= bit lets you specify which TeX engine to use, which is nice.

One thing you might run into is that the safe module that perltex uses causes trouble in the context of multiple threads. In this case, the following error will pop up, and it would seem like the end of the world:

Free to wrong pool 2e76f70 not 2fb6c8c at .../lib/Safe.pm line 125.

But then with different numbers and perl location probably. This is an annoying quirk, and was supposed to have been fixed in some previous version of Perl. If you encounter it: ignore it. The content will still generate fine, you will just have a memory leak because perl tried to free up memory in completely the wrong place and stopped itself. In order to negate the effect of this quirk, I made sure to clear most variables used by perl when it's doing its thing, so that what memory does leak is very little.

** WARNING ** No Unicode mapping available

You get this error for blocks in which the font you've chosen does not support the characters you've written. A good example is the {verbatim} block, where fontwrap cannot add any tags, which leads to the default font failing in mixed language setting.

Downloads

There are several files currently available for download:

Fonts

This package makes use of fontspec, a XeTeX package (how many versions of TeX are there? Apparently it doesn't matter, they pretty much nest into each other it would seem) which allows using system fonts without having to be ridiculously knowledged on making TeX font definitions - which, trust me, you don't want to learn. Use fontspec.

The reason I say this is because the example .tex file calls several fonts which you are unlikely to have: Arial (comes with every conceivable MS product it seems, as well as being downloadable from sourceforge) and Palatino Linotype (which comes with MS Office 2003, windows 2000 and windows XP) I kind of assume you already have, but it also uses きろ字, from the people at Second Wave, and the Ume Mincho font, which is a nice, clean, open source if you can believe it font, kept maintained through its SourceForge page.