The progress and faults of Chinese character informatization
It is said that in a few decades the world will be a society with two major languages: Chinese and English. I say that if we follow the current way of thinking, this may not be the case. .
Since more than ten years ago, the public and experts have been talking about Chinese character input, and each has his or her own opinion. Later, it gradually became indifferent until no one applauded it. However, one fundamental factor has not attracted attention - the inconvenience of oral expression of Chinese character parts.
The letters or components of any language need and should be expressed in spoken language. As we all know, each Latin letter not only acts as a phoneme in vocabulary, but can also be expressed phonetically on its own. The expression of a single letter is easier than the pronunciation in a word. Not only can people of all ages and children who do not know English speak fluently: VCD, USA, CCTV, but they can also "read" the letters through the phone, and they can also clearly speak foreign names and vocabulary.
In contrast, the written parts of Chinese characters such as: wood, fire, earth, person, mouth, etc., can of course be used independently, while the non-formed parts such as 厶, etc. cannot be used verbally. Express. People often say: "Gong Zhang Zhang, Mu Zi Li", but words such as "Juguan Yishu" cannot be described in the above "split parts" method.
The informatization of Chinese characters is inseparable from the expression of components. Without titles, it cannot be described in words, and it is difficult to use character assignments. This is the fundamental reason for the "thousands of codes galloping forward".
A considerable number of middle school students can recite and memorize the "Periodic Table of Elements", which is naturally beneficial to learning chemistry. But I don’t know how many teachers and students in the “Chinese Department” are equally familiar with the “Chinese Character Parts List”. Of course, this does not mean that students in the Chinese Department do not work hard.
What should be pointed out is that if even Chinese major students cannot master the language standards, can they still be provided to the public for application?
If our Chinese character component specifications cannot be grasped by the public, can this specification still be used very effectively in informatization? Chinese characters come from a collection of components, and it is okay to have stricter standards for the formal components of Chinese majors. However, is there a simplified list of practical component specifications? It is not only suitable for children to learn, but also beneficial for foreigners to learn Chinese. Language is used by the public, and language standards must also be used by the public. In the information age, can language standards be more popular?
Focus on the process but ignore the goal
Over the years, we have attached great importance to the digital representation of Chinese character components and focused on the assigned characters when inputting into the computer, thinking that this is the goal of Chinese character informatization ; In fact, the digital conversion of Chinese character components is just a process. What is really worth noting is the "human-computer dialogue" thinking process of converting Chinese characters into the human brain. This is the goal.
The information standardization of Chinese characters must not only adapt to machine processing, but also must pay attention to the "human-machine dialogue" process. The primary consideration for the informatization of language is that people who speak and write should reasonably transform "stroke writing" into "character writing", and of course they must adapt to people's thinking habits.
Before and after the "May 4th" Movement, the ancient Chinese language, which was divided into literary and Chinese languages, changed into a vernacular language that integrated literary and Chinese languages. It entered the homes of the common people from the ivory tower. This was a milestone in the history of the development of the Chinese language. A milestone. Only with the vernacular language can we create conditions for the popularization of Chinese and produce "phonetic codes", making pinyin input widely used.
In 1997, the "GB 13000.1 Character Set for Information Processing—Chinese Character Component Specifications" issued by the National Language Working Committee stipulated the 560 components of the "Chinese Character Basic Components List" and their usage rules.
The basic alphabet of any Pinyin text is very concise, and children and foreigners can quickly memorize it. However, the "parts table" of Chinese characters and the "parts assignment table" of various Chinese character stroke code input methods are all very large and difficult to remember by heart.
Why does the army need military ranks? An army has many officers. Outsiders can remember their names and hundreds of positions. However, as long as you have a military rank, you can call any officer. For Chinese character parts, should we also create more reasonable naming and classification of parts?
In social life, titles are constantly changing. Take people as an example. With the development of society, people's mutual relationships are constantly changing, and people's titles will naturally change accordingly. People's titles not only have the characteristics of the times, but also have regional characteristics, which will spread with people's communication and mobility. The specifications and names of components should also change according to the requirements of information technology.
For Chinese characters to correspond to 26 Latin characters, it is necessary to change its own rules and restore component names. In the process of digitization, Chinese characters must continue to develop their own theories and innovate continuously to maintain their status.
A sorting conversion of sorting benchmarks and Chinese character components
Many information projects have failed miserably due to misunderstandings of data, but latecomers who are not afraid of sacrifice are still brave and brave. forward. There are only new models and versions in their minds. They cannot see the generation and changes of data, and of course they cannot see the people who manage the data. In the view of some bosses, procedures are the decisive factor, and data is just a matter for the entry clerk.
The information age is an orderly era, an era of digital earth. People who do not follow objective laws and do not pay attention to the meaning of "orderly" will naturally fall into the quagmire.
In addition to the lack of titles, Chinese character components also lack symbolic expressions and shortcut sorting rules. In fact, with the title or symbolic code, the sorting is easily solved.
Sorting is not a random formation of queues. There are recognized benchmarks and criteria for sorting.
The generally accepted sorting criteria are: numerical and Latin alphabetical order, ascending order and descending order are available.
The principle of computer sorting is always "adjacent competition", which is like playing in an arena. It's as simple as "1" or "0", which is greater, or "p" or "q". As long as the two adjacent parameters meet the sorting criteria, the entire queue becomes an ordered queue.
It is true that every thing can choose multiple comparison standards. However, people have formed the habit of referring to recognized standards, which are the ranking benchmarks.
Everything must be converted into the corresponding numbers or characters at once as much as possible, and the process should be as intuitive and simple as possible.
An intuitive conversion means that in the conversion process of the thing, only one basic feature is used to directly convert it into a number or character, and the position of the related thing in the queue is also determined.
An intuitive transformation is not only suitable for machines, but also suitable for people’s daily expressions.
In the existing Chinese character component specifications, the strokes of the component must be calculated component by component first. Since the strokes of the component are 1-16, there are multiple components with the same stroke. For example, there are as many as 99 4-stroke components. . In order to determine the position of a part in the queue, after the strokes are first calculated, new features must be selected again for comparison among parts with the same strokes. In this way, the ordering in the part specification is not a one-time, nor an intuitive transformation.
One intuitive conversion is an important principle of digitalization. The existing Chinese character component specifications are sorted according to "number of strokes" and "stroke shape", which can only be applied to manual dictionary retrieval and cannot meet the needs of digital processing.
Basic strokes of Chinese characters
For a long time, in the study and application of Chinese characters, the two endpoints of Chinese character structure have been highlighted, namely strokes and whole characters, stroke shape, stroke order, glyph, The meaning of characters has become an important criterion for Chinese character teaching and examination evaluation.
Since the only writing tools are a knife or a pen, the strokes and brushstrokes of the characters are emphasized and highlighted during the writing process, while the components are weakened. Chinese characters have become more and more "horizontal, flat and vertical" from the initial pictograms. The reasons are firstly the expansion of the scope of application and secondly the changes in writing tools. However, the most fundamental thing is that due to the progress of society and the improvement of productivity, not only faster writing speed is required, but also faster recording speed is required.
Fast writing requires simplifying the structure, reducing strokes, simplifying glyphs, and straightening the strokes;
Accurate writing requires differentiation, differentiation of polysemy words, and an increase in the number of words to achieve accurate expression.
Tools can influence and determine product characteristics. The carving of tortoise shells with knives in ancient times, the writing brushes of agricultural society, and the writing of pencils, fountain pens, and ballpoint pens in the industrial era determine that the writing process of Chinese characters is based on strokes.
Chinese characters have been refined for thousands of years, and each character has formed a beautiful structure. The widespread use of writing created a need for pens, which were used to their fullest potential by the Chinese. Ink brush and calligraphy are the Chinese people's major contribution to human civilization, organically integrating words, art and the author's emotions.
Restore the original information of the components
The "components" that connect the previous and the next, this component that played an important role in the production of Chinese characters, have been weakened in the development process of Chinese characters. Over the long years, although a lot of component information and its sources have been slowly lost, this does not affect the application and teaching of Chinese characters.
In the process of teaching Chinese characters, the number and order of strokes have become the main parameters for describing and summarizing Chinese characters. Some parts as radicals can also become symbols of Chinese character classification. Even among these radicals, there are more than 10 important parts with "lost" titles.
We have experienced thousands of years of influence. We are in a Chinese environment, and it takes a long time to learn Chinese characters. It is still difficult to use computers to teach Chinese characters in China, and it is even more difficult to use computers in foreign teaching. There is still no mature way for overseas people to learn Chinese characters on computers.
The names of components are gradually forgotten by people, and it is difficult to directly describe and describe the components. Fortunately, the Chinese "Words to Say Words" function is convenient and popular, and can easily explain parts. This result further accelerates the "forgetting" of component names.
For example, "官" is different from the characters "安" and "子" mentioned above. The difference lies in that the part below it, "官 (lower part)?", has lost its part title. There is no way to describe it in spoken language. However, writing the word "official" face to face for thousands of years will not bring any trouble to learning and understanding it.
For the needs of communication and informatization, the word "官" cannot be described by splitting characters like "Li and Zhang". However, using the associative method to express "official" as "official" is equally accurate.
People can use associative words to describe Chinese characters, but currently they cannot use this method to conduct human-computer dialogue. Computers are still far from reaching the level of human thinking and judgment, let alone adapting to the different habits of different people. We naturally think, why can't the title of "official (lower part)" be restored? In this way, Chinese characters can be expressed like "Mu Zi Li".
The widespread application of computers has brought about qualitative changes in "writing" tools. The computer can input the parts as a whole, weakening the strokes of Chinese characters. In this way, the informatization of Chinese characters may "return" to the character creation stage and highlight the overall image of Chinese character components.
The comparison point of Chinese character information cannot be based on the current modern character application standards, but must be selected when it is produced and the information in the period when the component is most active. That is to say, we cannot choose the language habits of 2000; nor can we use the language application level of 1980 as a comparative standard for the informatization of Chinese characters, but we must look for it from the era when Chinese characters were produced.
Therefore, the informatization of Chinese character output should restore the original information of the Chinese character components and connect the original Chinese character components with the current assignment code.
Components have become a gap in the informatization of Chinese characters
Some people call Chinese characters "Chinese character trees", but in fact, the "tree" is mostly used by computers and software. The computer's Chinese character input method can also be viewed as a tree. First, it is divided into "shaped code" and "phonetic code". "Shape code" can be chosen between two assignment methods: based on parts or strokes.
The former can express Chinese characters through the collection of component names, which is in line with the ancestors' thinking of character creation. Once the consistency between component names and assigned characters is abandoned, the input method loses the basic feature of language - the unity of pronunciation, and breaks away from the method of language habits, which naturally makes it difficult for people to learn.
Stroke input was once considered a cumbersome method and was not favored on computers. With the overwhelming number of mobile phone users also wanting to access the Internet, using the smart screen of the phone's numeric keyboard to prompt pen code input has become the best choice. In the face of the small keyboard, it is an advantage to just stroke and ignore the components. Although the keyboard is small, the intelligence in the chip makes up for the lack of area.
Regardless of how it ends up being judged, this technology is already widely installed on mobile phones.
It seems a bit redundant to describe the content of the above paragraph that everyone knows. However, in a rapidly developing society, technology and the market it faces are competing with each other and advancing in a spiral. Methods and technologies that are useless today may find a use tomorrow. Although this is a truth, in many cases the value of a shining idea is often measured by the "status" of its owner.
The unification of cultural traditions and computer applications is still our concern. The advancement of tools not only brings about the advancement of thinking, but also allows us to find the most suitable text mode for computer expression in the long history. Can the component-based word-making thinking process of our ancestors be directly integrated with computers? It is necessary to find the location of components in the long history, and it is also necessary to innovate for component informatization in the process of informatization factory.
The advancement of technology allows people to extract any required parameters from history, astronomy, archaeology, earthquakes, scientific and technological dating, etc. For example, 200 experts worked together for five years to complete the "Annual Table of Xia and Shang", pushing back the chronology of Chinese history by 1,229 years. In this way, not only does the chronological record in Chinese history extend forward by 1900, but it also enhances people's new understanding of the subject. All history, resources and culture on the earth are useful and cannot be destroyed or discarded without authorization. .
Computers rejuvenate components
In order to find input methods for the public, we went back 6,000 years ago. Parts are the basis of the whole character. When using a pen as a tool, the features of the parts will not stand out. The emergence of computers should be said to have given components the opportunity to rejuvenate, or to try to organically correspond components to characters.
During the industrialization period of hundreds of years, Western society began to try to use electromechanical structural devices to encode and decode characters. After this method was transplanted to China, it could only produce words based on rote memorization. 4-digit "telegraph code".
If the emergence of the typewriter freed Western society from casual cursive handwritten letters; the teletypewriter (including, of course, the keyboard of a computer terminal) further enabled the public to transcend "Morse" code and created freedom Communication application environment. Well,
Due to the three-layer structure of Chinese characters, even if we introduce these devices, we cannot directly imitate Western text informatization. To use a machine to input Chinese characters, we actually need to change the habit of writing with a pen that has been developed over thousands of years, and change the habitual process of thinking. This point is rarely noticed in the process of informatization or "digitization".
The "fission" and "aggregation" of Chinese characters
After computers widely entered society in the mid-1980s, inputting Chinese characters became a problem. No matter how you comment on various codings, most people still think it is "difficult to learn". Why is it so difficult to "marry" computers in our own native language? Whether there is no need to code or memorize codes is the wish of many netizens, especially a large number of middle-aged and elderly people.
The paging station also uses computers. We don’t need to train the paging girls at the Chinese character station to speak Chinese characters. Everyone can express themselves clearly. For example, we can say: "Muzili"; we can also say "official's official".
Note that when "speaking" on the phone, there are no auxiliary gestures, and it cannot be displayed on the screen or blackboard. It relies entirely on the mouth. This is because Chinese people are able to express according to language habits and use one word to illustrate and explain another word. Not only can there be "fission" to separate words, but also "aggregation" using association. So, can we directly use the "speaking" method to input Chinese characters into the computer? The answer is yes, but the written expression must be standardized.
The development of language and writing stems from social progress, and language and writing promote social progress. Words are synchronized with language, and they also play a role in standardizing and storing language.
With the rise and fall of nations, especially the development of science and technology, languages ??also rise and fall. Naturally, we cannot take the future of Chinese characters lightly.
Language rules cannot be lost in coding
With social changes and progress, the meanings of many components themselves are no longer universal, but these components still survive and are even used in many commonly used words. They can be found.
There is an uncharacterized part "?" in "brigade", which should be the "flag" that was referred to when the word was first coined. In ancient times, people held flags and lined up to fight for the survival of their tribes.
Nowadays, "brigade" is used more for "tourism". It happens that the tour guides also raise small flags and lead the guests to review those ancient battlefields. The thinking of the ancestors has been resurrected in modern life, and there are such coincidences in many Chinese characters.
A batch of parts whose names have disappeared does not affect learning and writing, but it brings difficulties to the informatization of Chinese characters. In the modern language environment, it is bound to face difficulties if the assignment method is used to encode Chinese characters using simple component-by-component assignments, and it is applied to the public.
“If the name is not correct, the words will not be clear.” If the parts have no names, it will be difficult to express them. To assign values ??to commonly used radicals and character components, it is necessary to find and restore the lost original information to ensure the integrity of Chinese character informatization.
Only "language" is needed by the public, and "encoding" does not belong to the public. This is a law at home and abroad in ancient and modern times. The question is, is it possible for us to create a computer input method based on Chinese rules? The answer is yes.
1. The description of Chinese character informatization should be based on "natural language";
2. The basis of Chinese character informatization lies in component description;
3. Restoration Part designation, looking for missing information.
Reviving a lively, vigorous and self-reliant national culture
Although British and American Internet experts and philologists also say that Chinese will be the most commonly used language on the Internet, the obstacles hindering the development of Chinese web pages are still complex. Chinese input.
Mr. Hu Shi pointed out in 1914: "Typewriters are made for writing, not writing for typewriters. Because they cannot be used as typewriters, they want to abolish writing. This is foolish and true. There are tens of millions of times more people with toes that are suitable for crotch, not to mention that our country’s writing may not be suitable for typewriters”? Mr. Hu's words are still very applicable even in the "computer age".
However, in the past few years, input methods often paid attention to the correspondence between Chinese character strokes and characters, while ignoring the thinking process of writing, or in other words, they were divorced from the natural habit of daily speaking. Therefore, even if some input methods are acceptable, they are easily forgotten as long as they are interrupted for a period of time.
Faced with the issues of civilization and culture, tradition and progress, can "progress" that abandons traditional cultural heritage be considered our pride? China's long culture is a powerful cohesive force for the Chinese nation. We have the responsibility to maintain and carry forward China's traditional culture in the digital revolution, and to organically combine modernization with tradition. The use of Chinese characters is the most important part of it. .
The widget specification is not intended to be used against "input methods"
Symbols are not difficult to invent. If China really has the buds of science, it will definitely create many simple symbols. China used to only have technology, not science. Not all Westerners find symbols in their letters, they also invented many other symbols, such as -*/= and so on. Inventing symbols is very simple, and you can memorize a few letters in a day, but getting society to accept them is the most difficult thing.
In order to solve the problem of "input method", it is of course necessary to formulate Chinese character information standards, but don't forget that the problem of Chinese characters is not a problem of the input method. Therefore, the informatization standard of Chinese characters is not just for "amateur coding enthusiasts", but to meet the needs of the whole world for the informatization of Chinese characters.