My latest Flickr photograph

Where's that sequence, Poms?! in nihongoresources

Sunday 26th of October 2008, 01:18:20 pm

To answer that question: intangibly located in hypothesis space. Basically, decomposing kanji is the single most boring job I can think of doing, and I had to do a lot of it.

4,299 glyphs later, and the decomposition job is done. At least, you'd think, but secretly it turns out this is not the case. Yes, it is interesting data and I'd love to see it turn out to be a valuable to other people too, but the bottom line is this: I finished manually decomposing over four thousand glyphs. This data will have mistakes.

And it won't just have the silly typo mistakes, it probably some of those really nasty very-hard-to-track errors: all the right subcomponents, all the right decomposition markers, and then not quite in the right order. Other than running through each kanji/glyph all over again, I am not going to catch this myself.

So a request: got free time? Need a bit of mind numbing?

HELP OUT!

To see what I'm talking about I've put the "indigo" data (which is what the decomposition data is known as, on my computers) online for downloading - note that while released under a creative commons license (v3.0, by/nc), this file is considered a "beta" version of sorts.

I can probably use the data as it is now to sequence the kanji in a pretty useful way, and the errors that *are* left in it are unlikely to have a massive impact on the sequencing (and of course the later in the sequence, the less of an impact), but I'd be far happier if I can guarantee - with help from some other people =) - that this data is correct.

You know where to contact me! =D

On a functional note: the download is a zip file with three files in it:

So, enjoy, and I hope to post the sequenced kanji series soon (I am in the process of moving at the moment too, so that might frustrate a timely delivery a bit more. Have to be out of my current house before November >.>;)

- Pomax, signing off for the day