Analysis of the Unicode Han Database

The Unicode Han Database is part of the Unicode standard and contains data on all CJK Unified Ideographs encoded by the standard.

As of version 5.1, Unicode contains 71.234 CJK character and a total of 1.1 million character field values.

The Unihan database groups character fields into Field Types. For each field type below, the fields, the (assumed) language, and the number of characters having a value for that field are listed.

Definition

The English gloss

Definition 20627

Reading

The pronunciation for a given character

Japanese Kun Japanese 11291
Japanese On Japanese 13174
Mandarin pronunciation (Pinyin) Chinese 25478
Tang pronunciation Chinese 3811
Cantonese (Jyutping) Cantonese 20015
Korean pronunciation Korean 9050
Korean pronuniciation in Hangul Korean 7745
Vietnamese pronunciation Vietnamese 8300

Numeric

The numerical value of an ideograph

Accounting Numeral 26
Numeric Value 17
Other Numeric 30

Radical-Stroke Counts

Total Strokes 27929
Unicode Radical-Stroke Count 71234
Adobe-Japan1-6 Japanese 13411
Japanese Radical-Stroke Count Japanese 198
KangXi Radical-Stroke Count Japanese 63696
Morohashi Radical-Stroke Count Japanese 157
Korean Radical-Stroke Count Korean 20

Variants

Variant ideographs

Compatibility Variant 997
Semantic Variant 3205
Specialized Semantic Variant 482
Unicode Z-variant 2566
Simplified Variant Chinese 2674
Traditional Variant Chinese 2593

Dictionary Index

Index to standard dictionaries

Grammata Serica Recensa, 1957 7403
Karlgren: Analytic Dictionary of Chinese and Sino-Japanese, 1974 2560
IRG Dai Kanwa Ziten (Morohashi) Japanese 17864
Morohashi index Japanese 21204
Nelson: Modern Reader’s Japanese-English Character Dictionary Japanese 5398
Fenn’s Chinese-English Pocket Dictionary, 1942 Chinese 5937
Hanyu Da Zidian position Chinese 55818
IRG Hanyu Da Zidian Chinese 55813
KangXi Chinese 70206
KangXi position Chinese 20938
Mathews: Chinese-English Dictionary Chinese 8988
Pronunciation and Frequency Chinese 3799
Song Ben Guang Yun Chinese 19583
Cheng, Bauer: The Representation of Cantonese with Chinese Characters, 2002 Cantonese 809
Cowles: A Pocket Dictionary of Cantonese, 1999 Cantonese 4821
Lau: A Practical Cantonese-English Dictionary, 1977 Cantonese 3516
Meyer, Wempe: Student’s Cantonese-English Dictionary Cantonese 7352
Dae Jaweon, 1988 Korean 16026
IRG Dae Jaweon Korean 16024

Dictionary-like Data

Miscellaneous lookup information

Four-Corner Code 16256
IRG IICore 9810
Cangjie Input Code Chinese 29148
Cihai, Zhonghua Bookstore, 1983 Chinese 13884
Fenn: The Five Thousand Dictionary, 1979 Chinese 5075
Frequency Chinese 5089
Hanyu Da Zidian Radical Break Chinese 200
Phonetic Index (from Ten Thousand Characters) Chinese 11463
Xiandai Hanyu Cidian Chinese 10992
Hong Kong proper shape Hong Kong 4825
Hong Kong school grade Hong Kong 2632
Cheng, Bauer: The Representation of Cantonese with Chinese Characters, 2002 Cantonese 809

Mappings

Encodings in other character sets

CCCII 19698
EACC 13244
Xerox code 9747
IBM Japanese Japanese 360
JIS X 0208-1990 Japanese 6356
JIS X 0212-1990 Japanese 5801
JIS X 0213-2000 Japanese 3695
Big5 Chinese 13063
CNS 11643-1986 Chinese 17258
CNS 11643-1992 Chinese 17258
GB 12345-90 Chinese 6866
GB 2312-80 Chinese 6763
GB 7589-87 Chinese 4836
GB 7590-87 Chinese 2842
GB 8565-89 Chinese 827
PRC Telegraph Code Chinese 7085
Pseudo GB 12345-90 Chinese 153
Taiwan Telegraph Chinese 9041
Big5 HK Supplementary Character Set Hong Kong 4512
KPS 10721-2000 Korean 19301
KPS 9566-97 Korean 4653
KS X 1001:1992 Korean 4888
KS X 1002:1991 Korean 2856

IRG Sources

The Ideographic Rapporteur Group

IRG U 41
IRG Japan Japanese 13684
IRG PRC and Singapore Chinese 57628
IRG Taiwan Chinese 54989
IRG Hong Kong SAR Hong Kong 4511
IRG North Korea Korean 24122
IRG South Korea Korean 17661
IRG Vietnam Vietnamese 9298

1 thought on “Analysis of the Unicode Han Database

  1. Pingback: YuJisho: a Unicode CJK Character web dictionary « devioblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.