YuJisho: a Unicode CJK Character web dictionary

Chinese (or Japanese) characters have been fascinating me since I first learned about them in the early 90’s, and I immediately started some small programming projects dealing with this topic, among them a Kanji flash card application, one of my first Windows (3.1) programs.

Every now and again, I visited the websites of Jim Breen and Unicode, downloaded fonts, built a vocabulary trainer, and so on. One of the latest activities was an analysis of the Unicode Han Database.

There are a number of CJK dictionaries on the web, and the main objection I find with most of these websites is that you not only need to specify what you are looking for, but also need to tell the site where to look (e.g English, Japanese, Romaji, transcription method, etc).

I wanted to have a single input line with nothing else, and there should always be some kind of result.

Of course, I had to deal with performance-tuning the search algorithm, and I think it performs pretty well now.

A couple of problems I came across dealing with Far East scripts and Latin in the same SQL Server table:

When you look for a CJK character in an NVARCHAR column using the Latin1_General_CI_AS collation, the character may match any other character in that column. Switching to a collation supporting CJK, such as Chinese_PRC_90_CI_AI, solved the problem.

SQL Server 2000 did not handle surrogate pairs well with the available collation Chinese_PRC_CI_AI. According to this blog by Qingsong Yao, the collation Chinese_PRC_90_CI_AI and related collations of SQL Server 2005 solve the surrogate pair problem.

That all said, here is my online character dictionary, YuJisho. The name is a combination of the U in Unicode and the Japanese word for “dictionary”.

Any feedback is welcome 😉

3 Responses to YuJisho: a Unicode CJK Character web dictionary

  1. […] MediaWiki via JSON API In its first version, YuJisho provided a web search interface to a collection of freely available dictionaries. The obvious extension to that principle is to […]

  2. […] my YuJisho online dictionary web application, I was looking for freely available fonts and dictionary data related to CJK […]

  3. […] deployed by first version of YuJisho nearly 4 years ago, and, as I developed more and more MVC applications since then, I felt it was time to migrate the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: