My latest Flickr photograph

Unicode strings in java

Sunday 3rd of May 2009, 07:54:12 am

I got fed up with Java's inability to properly handle unicode strings due to it relying on a 16 bit "char" primitive. Unicode (at the time of writing) basically covers a space that requires 4 bytes, and char is only 2 bytes 'wide', so it just ... doesn't work. Strings report the wrong length, substring selection can chop up unicode glyphs mid-byte-sequence, it's a mess.

So I wrote a uString object instead. It's downloadable as a convenient jarchive for use in your own projects (with the source compiled into the jar), and hopefully it will become obsolete soon enough as the java String starts using a variable length encoded character primitive (I'm thinking a 7-bit-with-has-more-bit byte, but who can say what's optimal)

The javadoc can be found here

- Pomax

0 comments - view/add comments