YuJisho: a Unicode CJK Character web dictionary

Chinese (or Japanese) characters have been fascinating me since I first learned about them in the early 90’s, and I immediately started some small programming projects dealing with this topic, among them a Kanji flash card application, one of my first Windows (3.1) programs.

Every now and again, I visited the websites of Jim Breen and Unicode, downloaded fonts, built a vocabulary trainer, and so on. One of the latest activities was an analysis of the Unicode Han Database.

There are a number of CJK dictionaries on the web, and the main objection I find with most of these websites is that you not only need to specify what you are looking for, but also need to tell the site where to look (e.g English, Japanese, Romaji, transcription method, etc).

I wanted to have a single input line with nothing else, and there should always be some kind of result.

Of course, I had to deal with performance-tuning the search algorithm, and I think it performs pretty well now.

A couple of problems I came across dealing with Far East scripts and Latin in the same SQL Server table:

When you look for a CJK character in an NVARCHAR column using the Latin1_General_CI_AS collation, the character may match any other character in that column. Switching to a collation supporting CJK, such as Chinese_PRC_90_CI_AI, solved the problem.

SQL Server 2000 did not handle surrogate pairs well with the available collation Chinese_PRC_CI_AI. According to this blog by Qingsong Yao, the collation Chinese_PRC_90_CI_AI and related collations of SQL Server 2005 solve the surrogate pair problem.

That all said, here is my online character dictionary, YuJisho. The name is a combination of the U in Unicode and the Japanese word for “dictionary”.

Any feedback is welcome 😉

3 thoughts on “YuJisho: a Unicode CJK Character web dictionary

  1. Pingback: Accessing MediaWiki via JSON API « devioblog

  2. Pingback: Open Data « devioblog

  3. Pingback: Updating YuJisho: a Unicode CJK Character web dictionary | devioblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.