Analyzing Combining Unicode Characters

Some scripts supported by the Unicode standard define combining characters, which may cause confusion for people not familiar with a specific script:

For one of these questions, I even analyzed the character sequence manually. But this analysis is not much fun if you try perform it repeatedly.

Recently I stumbled upon the SE user name n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳, so I had the idea to write a small program to output the Unicode code points and character names for a given input, based on Unicode’s UnicodeData.txt file.

The output for the text samples in the above links looks like this:

unispell 1

unispell 2

unispell 3

unispell 4

unispell nhahtdh

The initial version of this program is available for download here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: