Tuesday 23rd of September 2008, 01:22:51 am
A while ago I wrote a TeX package for automatically adding font markup to a multilingual text, or more specifically a multi-unicode-block text, so that I could simply write my Japanese grammar book in English and Japanese, without having to manually add font codes every time I used them together. Because I needed to do low level character bytecode evaluation, I ended up writing in for PerlTeX, which is just a perl wrapper for 'whichever flavour or tex you are using" (it literally wraps whatever TeX engine you use in a perl thread, and intercepts piped data), and more specifically for the XeTeX version of TeX. XeTeX is basically TeX dragged into the 21th century by expecting all its input to be in utf8 encoding, and being able to work with unicode in general. This unlike plain TeX, which was designed during an era when internationalisation was a non-issue.
This package has its own web page, which is found here, and if you have any good ideas on porting it to a genuine flavour of unicode enabled TeX rather than relying on Perl, do let me know - I would prefer to not be dependent on external technology.
I know, they are trying to migrate PDFTeX to LuaTeX, which would be pretty good because it would give access to a uniform 8-bit clean programming language from within TeX, but that has two problems:
1) Lua is 8-bit clean. That means it does not understand the concept of 'unicode string', or even 'character'. It only knows about byte strings. This is very last century, and means the LuaTeX people will probably have had fun, and will have some more, getting Lua to understand the idea that text consists of a series of glyphs, not bytes.
More importantly though, 2) LuaTeX is not slated for any form of public release until 2010. Technological advances are a slow thing.