org.nihongoresources.utf8
Class uString

java.lang.Object
  extended by org.nihongoresources.utf8.uString
All Implemented Interfaces:
java.io.Serializable, java.lang.Comparable<uString>

public final class uString
extends java.lang.Object
implements java.io.Serializable, java.lang.Comparable<uString>

The uString object is a replacement for the native java String object, written to properly handle text strings with unicode glyphs that are wider than 16 bits. Each glyph is positioned in its own string, which makes this object considerably more memory intensive than the traditional String, but this is a tradeof to correct data representation: string length and substring selections actually do what one would want them to do. On the deviant classname, normally I wouldn't do this, but this is not a "definite" class, but a stopgap measure until the native java String handles all of unicode correctly. As such, I want to type as little additional text as possible, and the almost universal convention "u" for "unicode" is pretty much as short as it gets. On a technical note, this class does not implement CharSequence, since it does not model its data using char primitives. As an alternative, please use the uString.toUnicodeCharacterArray() method in order to gain access to the sequence of glyphs in the string.

See Also:
Serialized Form

Constructor Summary
uString(char[] chars)
          Builds a unicode string based on an array of char.
uString(java.lang.String string)
          Builds a unicode string based on a normal UTF16 encoded String object.
uString(java.lang.String[] string)
          Builds a unicode string based on a UTF16 String array.
 
Method Summary
 uString append(java.lang.String suffix)
          Since we cannot overload operators (like "+"), we'll have to use append...
 uString append(uString suffix)
          Since we cannot overload operators (like "+"), we'll have to use append...
 char charAt(int index)
          Deprecated. 
 int compareTo(uString other)
          again, good to have! we don't care about string length.
 boolean equals(java.lang.Object o)
          always good to have!
static java.lang.String getUnicodeHex(char[] charArray)
          static method for getting the unicode hexadecimal value
 int length()
           
static void main(java.lang.String[] args)
           
 uString replaceCharacter(java.lang.String find, java.lang.String replace)
          Replaces one character with another (replacement can be the string "", target cannot)
 uString[] split(java.lang.String regexp)
          splits the string based on the specified regular expression
 uString substring(int begin_inclusive, int end_exclusive)
          Extracts a substring by building a new characters vector and passing that to the private vector-based constructor.
 java.lang.String toString()
          Effectively this is just an implode/join method
 java.lang.String[] toStringArray()
          Turns the string into an array of strings, one string for each unicode character
 uCharArray toUnicodeCharacterArray()
          Turns the string into an array of unicode characters.
 uString trim()
          removes whitespace at the start and end of this unicode string
 java.lang.String unicodeCharacterAt(int index)
          Returns the unicode character at the specified index.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

uString

public uString(java.lang.String[] string)
Builds a unicode string based on a UTF16 String array.

Parameters:
string - the String from which a uString will be built

uString

public uString(java.lang.String string)
Builds a unicode string based on a normal UTF16 encoded String object.

Parameters:
string - the String from which a uString will be built

uString

public uString(char[] chars)
Builds a unicode string based on an array of char. The resultant uString will have a length equal to (new String(chars).codePointCount() - any surrogate char will be combined with its pair into a single unicode character.

Parameters:
chars - The char[] from which the unicode string will be built.
Method Detail

length

public int length()

charAt

@Deprecated
public char charAt(int index)
Deprecated. 

Because the String method charAt(int index) pulls char from the internal char array, it will invariably generate the wrong information when surrogate pairs are being used. As such, the uString class does not implement this method, and flags it as deprecated so you'll get a warning when you use it in your code, and a runtime error if you refuse to back down after that warning.

Parameters:
index - do not use this method.
Returns:
do not use this method.

unicodeCharacterAt

public java.lang.String unicodeCharacterAt(int index)
Returns the unicode character at the specified index. Obviously, this method is surrogate-safe, in that it does not return half of a surrogate pair, but the entire unicode character. Acts as semantic equivalent to String.charAt(index)

Parameters:
index - the index of the unicode character in the internal array. numbering starts at 0.
Returns:
a String object representing a single unicode character.

append

public uString append(java.lang.String suffix)
Since we cannot overload operators (like "+"), we'll have to use append... -_-

Parameters:
suffix -
Returns:

append

public uString append(uString suffix)
Since we cannot overload operators (like "+"), we'll have to use append... -_-

Parameters:
suffix -
Returns:

substring

public uString substring(int begin_inclusive,
                         int end_exclusive)
Extracts a substring by building a new characters vector and passing that to the private vector-based constructor.

Parameters:
begin_inclusive -
end_exclusive -
Returns:

equals

public boolean equals(java.lang.Object o)
always good to have!

Overrides:
equals in class java.lang.Object

compareTo

public int compareTo(uString other)
again, good to have! we don't care about string length. "a" and "aaaaa" both come before "bcd".

Specified by:
compareTo in interface java.lang.Comparable<uString>

split

public uString[] split(java.lang.String regexp)
splits the string based on the specified regular expression

Parameters:
regexp -
Returns:

trim

public uString trim()
removes whitespace at the start and end of this unicode string

Returns:

toUnicodeCharacterArray

public uCharArray toUnicodeCharacterArray()
Turns the string into an array of unicode characters.

Returns:

toStringArray

public java.lang.String[] toStringArray()
Turns the string into an array of strings, one string for each unicode character

Returns:

toString

public java.lang.String toString()
Effectively this is just an implode/join method

Overrides:
toString in class java.lang.Object

replaceCharacter

public uString replaceCharacter(java.lang.String find,
                                java.lang.String replace)
Replaces one character with another (replacement can be the string "", target cannot)

Parameters:
find -
replace -
Returns:

getUnicodeHex

public static java.lang.String getUnicodeHex(char[] charArray)
static method for getting the unicode hexadecimal value

Parameters:
charArray -
Returns:

main

public static void main(java.lang.String[] args)