GBK backwards compatibility codes GB 23 12, which supports ISO 10646. 1 international standard upward, and is the connecting link between the former and the latter.
O 10646 is the coding standard published by ISO, namely Universal Multi-octet Coded Character Set (UCS), which was translated into Universal Multi-octet Coded Character Set in Chinese mainland and Universal Multi-octet Coded Character Set in Taiwan Province Province. Unicode encoding that is fully compatible with Unicode organizations. И o10646.1is the first part of the standard, framework and basic multilingual plane. China's 1993 is recognized in the form of GB 13000. 1 national standard (that is, GB 13000. 1 is equivalent to ISO 10646. 1).
Han O 10646 is a coding system including written forms and additional symbols of all languages in the world. The part of Chinese characters is called "CJK Unified Chinese Characters" (C refers to China, J refers to Japan, and K refers to Korea). Among them, the China part includes Chinese characters and symbols of legal standards such as GB 23 12, GB 12345 and Modern Chinese General Character List from Chinese mainland, and the face values of 1 and 2 characters in CNS1643 from Taiwan Province Province (basically equivalent to BIG)
I. Vocabulary
GBK specification includes all Chinese characters and symbols of CJK in ISO 10646. 1, and makes some supplements. Specifically including:
All Chinese characters and non-Chinese characters in 1. GB 23 12。
2. Other CJK Chinese characters in GB13000.1 The above total 20902 GB Chinese characters.
3. The 52 Chinese characters of GB13000.1are not included in the summary of simplified characters.
4. There are 28 radicals and important parts in Kangxi Dictionary and Cihai that have not been included in GB 13000. 1.
5. 13 Chinese structural symbols.
6. There are 139 graphic symbols in Big-5 that are not included in GB 23 12, but exist in GB 13000. 1.
7. Six pinyin symbols supplemented by GB12345.
8. Chinese character "○".
9. 19 vertical punctuation marks added in GB12345 (compared with GB 23 12, there are 29 vertical punctuation marks added in GB 10, of which 10 is not included in GB13000.
10.2 1 Chinese characters are selected from CJK compatible area of GB 13000. 1
11.gb13000.1revenue 3 1 IBM OS/2 special symbols.
Second, the code allocation and order
GBK is also represented by double bytes, and the overall coding range is 8 140-FEFE, with the first byte between 8 1-FE and the last byte between 40-FE, excluding a line xx7F. There are 23,940 code points in total, * * * includes 2 1 0,886 Chinese characters and graphic symbols, of which 265,438 Chinese characters (including radicals and components)+0,003 and 883 graphic symbols.
All codes are divided into three parts:
1. Chinese character area. Including:
A.GB 23 12 Chinese character area. That is GBK/2: B0A 1-F7FE. There are 6763 Chinese characters in GB 23 12, which are arranged in the original order.
B.GB 13000. 1 expands the Chinese character area. Including:
(1) GBK. It includes 6080 CJK Chinese characters in GB 13000. 1.
(2) GBK/4: AA40-FEA0. Including CJK Chinese characters and 8 160 supplementary Chinese characters. CJK Chinese characters take precedence, arranged according to UCS code size; Supplementary Chinese characters (including radicals and components) are ranked last according to the page number/position of Kangxi Dictionary.
2. Graphic symbol area. Including:
A.GB 23 12 non-Chinese character symbol area. That is gbk/1:a1a1-a9fe. In addition to the symbols of GB 23 12, there are 10 lowercase Roman numerals and symbols supplemented by GB 12345. There are 7 17 symbols.
B.GB 13000. 1 expands the non-Chinese character area. Namely GBK/5: A840-A9A0. BIG-5 non-Chinese symbols, structural symbols and "○" are arranged in this area. There are 166 symbols.
3. Custom area: divided into (1)(2)(3) three communities.
(1) AAA 1-AFFE, with 564 coded bits.
(2) F8A 1-FEFE with 658 code points.
(3) A140-A7A0,672 code.
Although the area (3) is open to users, its use is limited, because the possibility of adding new characters in this area in the future is not ruled out.
Third, fonts.
GBK has made the following provisions for glyphs:
1. In principle, it is consistent with the font/stroke shape under the column GB 13000. 1 G (Chinese characters originated from legal standards in Chinese mainland).
2. Within the overall framework of CJK Chinese Character Recognition Rules, all GBK-coded Chinese characters should be "orthomorphic without duplicate codes" ("GB"); That is, on the premise of not causing duplicate codes, try to use the new China font.
3. For Chinese characters that are beyond the CJK Chinese character recognition rules, or whose recognition rules are not clearly defined, temporarily put the old glyphs on the GBK code. In this way, in many cases, GBK absorbs the old and new glyphs of the same Chinese character.
4. The glyphs of non-Chinese symbols have been included in GB 23 12, which is consistent with GB 23 12; The part exceeding GB 23 12 is consistent with GB 100038+0.
5. Pinyin letters with tones are in the form of half-angles.