My latest Flickr photograph

A simple website link mapper

Friday 14th of May 2010, 02:31:16 pm

Ever had the problem where you needed to do a forward and backlink analysis of a website? Sure, for a simple homepage it's pretty easy, and some blog software will let you do it, and if you like google analytics (why do you like google analytics?) that will conceivably be able to help you do it too, but sometimes you're like me, and you just want a tool to do the job for you. You're thinking of using wget, but it doesn't actually do link mapping, just recursive webmirroring, so what are you to do...

Well, you download this script, rename it from getlinks.php.txt to getlinks.php and run it after you've run wget for the site you need to profile:

shell> wget --output-file=result.log --no-cache --html-extension --no-cookies --recursive --domains=mywebsite.com --accept *.html http://www.mywebsite.com

shell> php getlinks.php > result.txt

Job's a good'n.

And yes, I could have written it in PERL (heck, I already have several dirstructure traversing perl scripts for the various odds and ends jobs I do) but honestly, finding people who can actually READ perl code has become a rarity in and of itself. PHP is just as functional (oh, get off it, yes it is. I said functional. As in: it gets the bloody job done) and easier to read for most people who are passingly familiar with programming.

0 comments - view/add comments

Need SQLite 2.8.17?

Friday 10th of April 2009, 04:52:49 am

Need SQLite 2.8.17 but can't find it on sqlite.org? Then fret not, because it's still there, just not linked from the download page.

click http://www.sqlite.org/sqlite-2_8_17.zip to download it, or copy-paste that link if you don't trust direct filelinks (javascript might be making it look like a safe file O_o!)

As always, scan any files you download. Wouldn't want to be one of those millions of people who are responsible for spam, trojans and virusses infecting us all, do you?

1 comment - view/add comments

I hate Opera

Thursday 9th of April 2009, 03:52:16 am

The browser, mind, not the artistic music and singing.

Why? Because as a developer I like to be able to write cross-browser material, and Opera doesn't let me. For instance, have some fun looking at mouse handling in js across different browsers, and notice what Opera allows.

That's right, just one mouse button. I'm sorry, was the internet use moving to online applications, with their own context menus? Well too bad, Opera won't let you use those.

It also doesn't *tell* you it blocks things in its error console.

Thanks, Opera. Thanks to your idiocy I have to figure out some kind of idiotic button bar system now. I hope you fossilise soon (or experience a rebirth as a browser willing to move along with the internet. Even IE's trying, why aren't you?)

0 comments - view/add comments

Formatting plain text in html?

Thursday 12th of March 2009, 05:55:09 am

What happens when you combine dynamic content distribution with dynamic CSS manipulation?

The simple answer is "whatever you make it do", but the more interesting answer is "the basis for a dynamically adjustable text formatting system", which lets you set up print pages in terms of page dimensions and print dimensions, and have it automatically generate the right number of pages as divs on the page.

Which is exactly what I need for evaluating my book rewrite. Paired with a bit of nifty php processing so my 漢字(ふり) notation becomes 漢字ふり instead, and it's almost a work of magic.

Putting it all together's lead to this, which works in Firefox 3, Webkit (Safari 4 and Chrome 1), Opera (9.5x) and IE7. So consider me quite happy. This was actually really fun to write.

0 comments - view/add comments

More java strangeness

Tuesday 24th of February 2009, 01:43:37 am

Want to perform event based behaviour based on arrow keys?

I hope you didn't try onKeyTyped(), because that doesn't work. Arrow keys apparently don't count as typing, because they don't produce output.

Go use onKeyPressed() instead.

0 comments - view/add comments

PHP5 and SQLite3

Sunday 8th of February 2009, 01:33:14 pm

Some people like being able to use SQLite databases in PHP. That's great, except PHP only supports sqlite2.x databases, which haven't been supported by the "latest" SQLite for years now. The standard is SQLite3, has been for a while, and there's no native support a la the sqlite_open functions in PHP5. But fret not - while the documentation for SQLite3 on php.net is USELESS, the simple solution is to use PDO instead of using a dedicated SQLite object. PHP can talk to SQLite3 just fine, someone just needs to update the bloody documentation for it:

$dbh = new PDO('sqlite:yourdatabase.db');
foreach($dbh->query('SELECT * FROM table WHERE column = criterium') as $row)
{
 foreach($row as $key=>$val)
 {
  // there is a way to make PDO return as associative array,
  // I just didn't bother looking up how.
  if(!is_numeric($key)) echo "$key: $val\n";
 }
}
$dbh = null;

0 comments - view/add comments

Java doesn't understand the concept of 'virtual key'

Sunday 8th of February 2009, 01:13:13 pm

For your enjoyment I offer the following bug report on the sun bugtracker for java.

The short of it: you can't rely on java's VK_xxxx keymappings. Even though the whole reason those codes exist is so you don't have to match on character.

Goddamnit, sun.

0 comments - view/add comments

Java does not support unicode

Sunday 2nd of November 2008, 02:59:16 am

If you've ever programmed in Java, and have some knowledge of UTF-8, you're probably going "eh? yes it does?" at this claim. However, I can assure you it is quite true. And well documented, in fact.

Let me quote the javadoc for Character, from the most recent version on the JRE 6.0 API:

"Character information is based on the Unicode Standard, version 4.0."

Unicode 4.0 was released in April of 2003. That's over five years ago. After that, 4.1 was released March 2005, 5.0 in June 2006, and 5.1 in April 2008. With the transition from 4.0 to 5.0, the UTF-16 specification had to be extended to allow for the vast number of new characters, and (love it or leave it) UTF-16 received special reserved codepoints that said "if you see me, I am to be considered an offset for the character encoded in the next 2 bytes, because it has a codepoint that is so high that it cannot be represented in only 2 bytes".

The Java crew never upgraded the Java primitived to work with this new UTF-16 specification, which is a problem, as can be easily illustrated:

Clearly, ⿱𠂉乙 is three characters. It's a decompositional character for top-down component arrangement, the 'radical' 𠂉, and then the kanji 乙. The first and the last are not a problem for Java, since they fit in the old UTF-16 encoding - the middle one is a severe problem, though, since it does not fit in the older version of UTF-16. Let's look at the underlying cause:

"The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes." "A char value [...] represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points."

(both cited from the javadoc for Character mentioned above)

That's awesome, but for one thing: String and StringBuffer rely on char arrays, not int arrays, which means that String and Stringbuffer get a hell of a lot wrong: string length and substring won't give the correct result, and split behaves incorrect because it relies on finite state evaluation of char primitives.

Plus, and this is from a design point of view, the high/low surrogate codepoints are not characters, so allowing char to be just the high/low surrogate codepoint without a follow-up of real character information is plain wrong. Unicode specifically states that these things only have 'character' meaning when used in combination with a second pair of bytes to indicate which character is encoded:

"They are called surrogates, since they do not represent characters directly, but only as a pair"

(http://www.unicode.org/faq/utf_bom.html#34)

So where does that put us? Well, for those of us who rely on java to do the right thing, we're in the computing equivalent of the stone age, really. As long as the char primitive stays ignorant with respects to the current - beg pardon, almost two and a half year old - unicode standard, java simply does not support unicode, but makes people believe it does. That's a problem. If you support something, proudly proclaim this. If you don't, very explicitly and unmistakably mention this at the very first opportunity for it. And keep saying it.

Java does not support unicode...

and it turns out I'm one of those few who really need it to -_-

(And yes, luckily I'm one of those people who file a bug report when they happen across things like this ;)

edit: yes, I am aware of the String.codePointCount() method. Guess what: you still can't actually get to any of those codepoints. Brilliantly, String.codePointAt(int n) interprets the passed index as refering to the index in the char array, not the nth codepoint in the "looked at as a series of codepoints" sequence. And no, there is no "getCodePoint(int n)" method. Which is just gross negligence on the part of Lee Boynton and Arthur van Hoff, authors of the String class.

3 comments - view/add comments

Getting the codepoint for a unicode character

Monday 20th of October 2008, 04:10:43 am

Ever had to convert UTF16 to hex? If you're like me, then yes, frequently. If you're not like me, then you probably have no idea this was something someone might even want to do =)

Personally, I frequently need to know the unicode codepoint for a character because I "sort of know" where some character I am looking for will be in the unicode map, and I Babelmap doesn't let you find a glyph based on... err, that glyph. You need to know its codepoint.

A tad inconvenient, and no, the internet seems to have NO CONVERTERS AT ALL. I find this bizar, but then I find more things bizar (like starbucks selling coffee that has been burnt, not roasted, and no one complaining about this). So, in the good spirit of "I wouldn't be a paid software engineer if I couldn't write a bit of javascript that did this for me".

Turns out I can, with the help from the unicode FAQ, and the charCodeAt javascript string function. It was a bit frustrating getting the wrong value back, until I found the mozilla page explaining that charCodeAt has to be called on two positions rather than one for high UTF16 codes... problem solved!

The result is useful to me. Perhaps it is useful to you.

0 comments - view/add comments

LOAD DATA INFILE vs. INSERT

Sunday 19th of October 2008, 11:20:33 am

I used to curse at MySQL for the problems that the SOURCE command has. It's slow, it doesn't treat files as files, but as single lines, each being line 0, and most annoyingly, it's slow.

Also, it takes really long to insert data.

So I was so happy to find the LOAD DATA INFILE "command", that I decided to do this quick post to explain why it's so delicious:

A million rows of data inserted in a few seconds = very tasty.

The basic idea is that you format your data as tab delimited non-quoted data, each field corresponding to a column in your table layout, and store this in a file. Say, "data.txt" - you then load it into mysql:

mysql> LOAD DATA INFILE 'data.txt' INTO TABLE 'mydatabase.mytable';

and job's a good'un. You have to make sure that every row of course has the all its colums filled out. Relying on "AUTO_INCREMENT" for a column to fill in missing values won't do, because the data no longer has any indicator of which field is for which column. However, unlike with SOURCE, an input to LOAD DATA file is treated as a genuine multi-line file with, so if you make a mistake, MySQL will actually tell you at which line your mistake was made.

It's also a transaction command, so if it finds an error, it will stop and undo anything it might have done up to that point. Rather than a chore, it's simply a matter of fixing the error in the text file, and rerunning the command - no need to delete partially inserted tables or something

Lovely.

0 comments - view/add comments

earliest posts