- Why do those Thai characters display on the web page with a long tail?
- What’s up with these Unicode combining characters and how can we filter them?
- What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?
- ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ํํํํํํํํํํํํํํํํํํํํํํํํํํ Why does this character never end?
For one of these questions, I even analyzed the character sequence manually. But this analysis is not much fun if you try perform it repeatedly.
Recently I stumbled upon the SE user name n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳, so I had the idea to write a small program to output the Unicode code points and character names for a given input, based on Unicode’s UnicodeData.txt file.
The output for the text samples in the above links looks like this:
The initial version of this program is available for download here.