The so-called information processing refers to the process of screening, judging, sorting, cataloging, organizing, storing and analyzing a large amount of collected original information according to different purposes and requirements, and making it into information with certain use value.
Generally speaking, the collected original information is an initial, messy and isolated information. Only by sorting these zero-order information into regular, orderly and systematic high-order information can we make use of it; Only through description and indexing can zero-level information be transformed into secondary information, which is convenient for information storage, retrieval and transmission. Therefore, the process of information processing is the process of producing new information with high value and convenient use for users on the basis of the original information, thus increasing the value of information.
According to different standards, information processing can be divided into different ways.
According to the different response time, processing can be divided into real-time processing and batch processing. Real-time processing refers to the immediate processing and response of the sent data, which is generally suitable for routine operations; Batch processing refers to storing the sent data for a certain amount or time before centralized processing, which is generally suitable for statistical analysis business.
According to the depth of processing function, processing can be divided into pretreatment processing, business processing and decision processing. Pretreatment is a simple arrangement of information; Business processing is to analyze information and synthesize information to assist decision-making; Decision-making processing is statistical inference of information to produce decision-making information.
According to the different processing tools, it can be divided into manual processing and computer processing. Manual processing is the use of manual equipment to process information, which mainly exists in the initial stage of information processing. Computer processing is the use of computers for data processing, the original data processing, resulting in tables, graphics and other results.
3.4.2 Information screening
Information screening is the first step of information processing, and its purpose is to remove the false and retain the true, remove the rough and select the fine, and ensure the accuracy and effectiveness of information.
Information screening program
The basic procedures of information screening mainly include the following aspects:
(1) information collation. Information sorting is the premise of information screening and discrimination, and its purpose is to standardize and organize scattered and disorderly information for further processing and analysis.
(2) browse and review. Browsing and auditing are the central links of information screening and discrimination. Its purpose is to remove those obviously wrong or useless information and keep those obviously true or useful information. For some temporarily uncertain information, put it on hold for further processing.
(3) Review again. The uncertain information should be analyzed and studied again by consulting or other scientific methods, so as to scientifically determine its selection and improve the accuracy of information screening and discrimination.
Key targets of information screening in 3.4.2.2
(1) fictional information. This kind of information is completely fictitious and fabricated, without any factual basis. Mainly from the bad motives of information collectors, this kind of information must be removed.
(2) Add information. Although this kind of information has certain basis, some plots and contents are added by information collectors and transmitters through subjective imagination rather than facts, which need to be analyzed and distinguished.
(3) exaggerating information. This kind of information often exaggerates or narrows the facts, which is a distortion of the facts and will seriously affect the authenticity and credibility of the information.
(4) biased information. This kind of information is an element that unilaterally emphasizes the cause of an action or stifles an action. If this kind of information is not checked and corrected, it will affect the use value of information and even cause great losses to information users.
(5) Incomplete information. Because of a long time, the information is obtained incorrectly, or because the information source itself cannot be obtained, the information obtained only through individual phenomena or characteristics is incomplete information. This kind of information generally needs further supplementary collection.
(6) Fuzzy information. This kind of information comes from hearsay and innuendo of information collectors, and often contains words such as "said", "heard", "probably", "possible" and "there are signs". This kind of information is not credible and must be collected and verified again.
(7) Piece together information. In the process of collection, processing and transmission, this kind of information often combines information from different places, different times, different conditions and different properties into the same information in the same place, at the same time, under the same conditions and with the same nature. On the whole, this patchwork of information is unfounded.
3.4.2.3 information screening method
(1) sensory judgment method. Sensory judgment method refers to the method that information processors intuitively judge the authenticity and credibility of information by relying on their own knowledge, skills and experience in the process of browsing and reviewing original information.
(2) Comparative analysis. Contrastive analysis refers to the method by which information processors compare and analyze the information collected from different channels and the same information in order to determine the authenticity and credibility of the information.
(3) Expert judgment method. Expert judgment refers to the method that experts decide the value of some information that cannot be selected at the moment.
(4) Collective discussion method. Collective discussion refers to the method of collective consultation, which determines the choice of some information that individuals can't draw conclusions through collective wisdom.
(5) On-site verification method. On-site verification method refers to the method of instructing information collectors or information processors to go deep into the site to verify the authenticity of the information in question.
(6) Mathematical accounting methods. Mathematical accounting method refers to the processing method that information processors recalculate when they have doubts about the original information. This method can correct the information distortion caused by information collection and calculation errors, clerical errors or errors in transmission in time.
3.4.3 Information classification
Information screening is the rough processing of information, and information classification is the fine processing of information. Only by sorting information can we better store, retrieve, transmit and utilize information.
3.4.3. 1 Basic procedure of information classification
(1) Determine the classification method. At present, there are many methods of information classification, including regional classification, content classification, subject classification, time classification and comprehensive classification. Which classification method is adopted directly determines the sorting of information materials. Therefore, determining the classification is the basis and premise of information sorting.
(2) Implement information sorting. This is the second step of information classification, that is, classifying information materials for later work.
(3) organize information. After information classification, there is also a problem of arrangement of information materials of the same category. By sorting out information, information can become an orderly information system.
Specific methods of information classification in 3.4.3.2
(1) regional classification. Regional classification refers to the information division method according to different regions.
(2) Time classification. Time classification refers to the method of dividing information in chronological order. Time classification can also be divided by year, month and day.
(3) Content classification. Content classification refers to the method of classifying according to different contents contained in information. For example, according to industry, information can be subdivided into agricultural information, industrial information, commercial information, service information, tourism information, enterprise information, capital construction information, financial information, financial information and so on.
(4) Comprehensive classification. Comprehensive classification is a method of comprehensively dividing information according to time, region and content. According to different combinations, comprehensive classification can be divided into time domain classification, regional time domain classification, content domain classification, content-time domain classification, regional time-content classification, regional content-time classification, time-domain content-domain classification and time-content-domain classification.
Information description
Information description, also known as information description, refers to the process of analyzing, selecting and recording the external features and some content features of information according to certain management rules and technical standards. Through information description, a record reflecting the content characteristics and external characteristics of the original information is formed, which is an item or item. Directory is a file reporting and retrieval tool, which arranges many items in a certain order. An entry is the epitome of a kind of literature, and a catalogue is the epitome of a batch of literature.
3.4.4. 1 standardization of information description
The standardization of document description refers to the binding norms on the principles, contents and formats of document description within a national or international scope. In order to develop and utilize literature resources, it is necessary to obtain a consistent bibliographic information language to describe the characteristics of literature and the methods of reporting and retrieving literature. In 1960s, many countries realized the standardization of literature description in their own countries. On this basis, the task force of the International Federation of Library Associations and Institutions (IFLA) began to formulate international bibliographic standards in 197 1, and officially issued the International Standard Bibliographic Description (ISBD) in 1974, which was widely accepted by all countries in the world. The international standard bibliography has successfully solved the following problems:
(1) makes the document description items and their arrangement order interchangeable, that is, it realizes the international unification of document description.
(2) Overcome the language barrier, and make the descriptions of foreign documents easy to identify. Even readers who don't know a certain language can identify descriptive items through the symbol system.
(3) It is helpful to convert the general bibliography into the form of machine-readable catalogue.
In order to establish and improve China's unified document reporting system, carry out international bibliographic information exchange, and better develop and utilize document information resources, with the joint efforts of the Sixth Sub-committee of the National Document Standardization Technical Committee (Catalogue Description Sub-committee) and the China Library Society, China officially published the General Rules for Document Description in July, 1983. Since then, various sub-rules have been issued one after another, including the description rules of ordinary books, serial publications, maps, archives, ancient books, retrieval journals and references.
3.4.4.2 machine-readable directory format
Marc (Machine Readable Catalog) is the abbreviation of machine-readable catalog, which is recorded on the computer storage medium in the form of code and with a specific structure and recognized and read by the computer.
From 65438 to 0965, the Library of Congress began to develop machine-readable catalogs. Marc Ⅰ Ⅰ tape was produced in 1966. 1969 MARC Ⅱ tape was officially published, and then Marc data files such as monographs, serial publications, archives and manuscripts, visual materials, music scores and maps were published one after another. Because MARC format was developed by the Library of Congress, it is called USMARC (also called LCMARC). 1977, IFLA first published the Format of UNIMARC Universal Machine-readable Catalogue, which has been revised continuously since then.
CNMARC is a machine-readable catalogue format in China, which was formulated by the National Bibliography Organization of China according to UNIMARC. WH/T0503-96, the cultural industry standard of China people, was issued. This is in line with the provisions of ISO2709. Based on UNIMARC, all the fields defined in UNIMARC are retained, and the unique field definitions of Chinese publications are supplemented. For example, it adds the following fields and subfields: 09 1 uniform book number; 092 order number; 093 patent number; 094 standard number; 690 China Library Classification; 692 China Academy of Sciences Book Classification; 905 collection information, etc.
3.4.4.3 Dubinkel standard
Dobbin's core standard is short for DC. The format of DC metadata was formulated by the first seminar jointly held by OCLC (Online Computer Library Center, Inc.) and NCSA (National Center for Supercomputing Application), aiming at finding a concise, flexible and easy-to-use description format of information resources for non-professional librarians, so as to improve the development and utilization rate of network information resources. The object of operation is limited to electronic text resources on the network. The meeting produced 13 metadata items, which were named after the meeting place Dubin. At the third seminar in September, 1996, DC metadata further extended the processing object to image resources. In order to describe the image resources comprehensively, two description items, description item and rights management item, were added, and the names of some description items were modified, resulting in 15 description items. 1997 10 at the fifth series of seminars held in Helsinki, Finland, it was further clarified that the main function of DC metadata format focuses on the description or explanation of information resources, rather than the evaluation of information resources. Therefore, 15 yuan data items are divided into the following three categories:
(1) The resource content describes the class metadata item. There are the following metadata items in this class:
Title: The name of the resource given by the creator or publisher of the resource.
Creator: the creator of the resource.
Subject: a keyword that can reveal the subject content of a resource object or the subject content.
Description: the text description of the resource content, including the abstract of the document object or the content description of the visual work.
Language: The language type used by the resource object.
Source: the source information of secondary resources. General elements only contain information about the current resource. If it is necessary to display the current resource, the item may include the date, creator, form, logo or other metadata of the second resource.
Relationship: identification of secondary resources and their relationship with current resources. This element allows the association between related resources and resource descriptions. For example, cataloging from (yes version), translating from (based on), extracting from (yes part), format converting from (yes format) and so on.
Coverage: the temporal and spatial characteristics of resource knowledge content. Spatial scope refers to physical areas, such as latitude and longitude, standardized place names, etc. Time range refers to the content of resources (time period) rather than the time when resources are generated (time point); The time description is in the same format as the date item.
(2) Metadata item of intellectual property description. There are the following metadata items in this class:
Creator: an individual or organization that bears the main responsibility for creating resource knowledge content.
Publisher: A person, such as a publishing house, a university college or a corporate entity, who is responsible for transforming resources into their current form.
Contributor: refers to an individual or organization (such as editor, copywriter, illustrator, etc.) that is not listed in the creator's elements and has made an important contribution to the knowledge content of the resource and is second only to the creator. ).
Rights: a rights management statement, or a logo pointing to the rights management statement, or a logo pointing to a service that provides resource rights management information content.
(3) External attributes describe class metadata items. There are the following metadata items in this class:
Date item: refers to the date related to the creation or availability of resources.
Type: the category of resources, such as novels, poems, reports, papers, dictionaries, etc.
Identifier: a string or number that uniquely identifies a resource. For example, URL and URN in the identification of network resources, as well as other universal unique identifiers, such as the International Standard Book Number (ISBN) or other specification names, can be used as identifiers.
Format: the data format of a resource, which indicates what software or hardware is needed to display and execute the resource, such as text, JPG images, applications, etc.
Information index
Information indexing, also known as information disclosure, is a process of selecting, summarizing and refining the main contents and other formal features of information. Include selecting that feature of the information form, analyzing the features of the information content, and converting them into specific content such as signs reflecting the theme of the information content.
3.4.5. 1 information indexing program
The process of information indexing generally includes three links.
(1) theme analysis. That is, analyzing the topics contained in the information mainly includes the quantitative analysis of topics, that is, how many topics are contained in the information; Theme structure analysis, that is, how many theme concept factors are there in each theme; Topic content analysis, that is, specifically explain which topics the information contains and what conceptual factors each topic has.
(2) Subject indexing. Convert the results of topic analysis into topic recognition. According to the degree of revealing the information subject, there are four strategies for subject indexing: ① overall indexing: that is, the overall subject of an information entity is generally indexed by a logo. ② Comprehensive indexing: that is, all local topics or different topics of an information entity and their conceptual factors are indexed in detail. ③ Supplementary indexing: that is, in addition to the general indexing of the overall theme of an information entity, some local themes and their conceptual factors are also indexed separately. (4) Key indexing: indexing the subject parts related to the nature, task and purpose of the information system in the information entity.
(3) Inspection and audit, that is, the process and results of the above-mentioned subject analysis and subject indexing are inspected and audited, and finally the results of information disclosure are formally formed.
3.4.5.2 information index method
According to the different forms and properties of marks given in the indexing process, information indexing can usually be divided into two categories: classified indexing and subject indexing.
(1) classified indexing. Classification indexing is a method to classify and identify the content or formal characteristics of information. Through classified indexing, the information categories with the theme attribute of * * * * can be gathered together, and all the information can be organized into a hierarchical and organized whole according to the theme relationship between all kinds of information. Judging from the compilation methods of modern classification, classification methods mainly include hierarchical classification, faceted combination classification and mixed classification:
1) hierarchical classification. This classification is a hierarchical system based on the disciplinary nature of literature content, from general to specific, from simple to complex, and divided layer by layer according to the logical order of knowledge category. Its main features are: concentrating documents by subject and specialty, revealing the differences and connections of various documents in content from the perspective of knowledge classification, and providing a way to retrieve literature information from subject classification.
2) Classification of facet combinations. This is a classification type based on the principles of analysis and synthesis. Its basic idea is: any compound topic, no matter how complicated, can be decomposed into corresponding basic concepts; At the same time, they can also be expressed by the combination of corresponding basic concepts. Therefore, it is not necessary to list all topics in detail in the taxonomy, just list all basic concepts by category in the category table and assign corresponding numbers.
3) Mixed classification. This classification is a synthesis of the advantages of the above two classifications. On the basis of enumerating the class tables in detail, various combination methods are widely used. Such as the International Decimal Classification.
The above hierarchical classification has been widely used in the classification and retrieval of library documents since ancient times, and it is still widely used in the field of library information. Its advantages are: emphasizing the systematic organization of knowledge, conforming to people's habit of knowing things, and facilitating users to retrieve relevant literature information according to the subject system; The tree structure of categories is suitable for the shelf management of documents and the arrangement and organization of retrieval tools: Arabic numerals and Latin letters are usually used for identification, which is universal and makes it possible to share resources through international unified classification. Of course, the hierarchical classification system also has limitations, such as the limitations of its architecture and poor direct specificity; Not suitable for multi-angle indexing and retrieval; The category system is fixed and pre-listed, which can't reflect new disciplines and new things in time, so it is difficult to modify and supplement.
(2) Subject indexing. Subject indexing is a method of using standardized or non-standardized natural language as information subject identification. According to the principles of word selection, combination methods, normative measures and compilation methods. Subject method can be divided into title morphology, deformation, keyword method and narrative morphology.
1) title morphology. Title morphology is a method that uses title words (standardized things names and noun terms) as the identification and retrieval marks of information subject content. The source of title words is mainly the name of the indexing object or the common stereotypes in the title. The compilation of title words is called title table, and the main feature of title morphology is to compile tables in advance. Title words are organized in the vocabulary in a fixed combination way, and are searched according to the established combination. The function of title vocabulary is to manage and control the optimized title words from the aspects of meaning, word form, relationship between words and usage. Ensure that a thing is expressed by only one title word, and a title word only expresses one thing or meaning, so as to avoid confusion in the use process. Title morphology has good directness, specificity and versatility, and is suitable for feature retrieval, but its flexibility is poor.
2) deformation. Metamorphosis advocates using the most basic and inseparable lexical unit words as the subject words, which can be extracted from the information content and then standardized to express an independent concept. For example, "computer software" is not a unit word, while "computer" and "software" are unit words. In English, the unit word is often a word. The outstanding features of Metamorphosis are: emphasizing the unitization of vocabulary; Emphasis on late integration. Although Metamorphosis improves the flexibility of subject method, it is not practical because of overemphasis on lexical unitization, unreasonable vocabulary processing methods, easy to produce wrong combinations and high false detection rate.
3) Keyword method. Keyword method is a method of extracting meaningful information units (keywords) that can express the theme concept directly from the title, abstract or text of information materials as subject words, and then arranging them in word order for information retrieval. Keyword ranking can form an index system of subject retrieval, such as "rotating subject index" in Science Citation Index, whose keywords are extracted from the titles of documents. Keyword method is not controlled by thesaurus, it is fast and simple, and it is suitable for computer to organize and retrieve information. However, the disadvantage is that the words used in keyword method are not standardized, which affects the recall and precision of information.
4) Narration. Thesaurus is a method to select thesaurus from thesaurus and describe the theme of information materials through concept combination, so that indexing and retrieval can reach a higher degree of indexing. The remarkable feature of lexicography is that multiple descriptors can form any logical combination, which constitutes a variety of retrieval problems. Narratology has absorbed the advantages of the above-mentioned disciplines and methods, and has the advantages of intuition, concreteness, flexibility, accurate indexing and convenient retrieval, and has been widely used in literature retrieval. At present, most retrieval tools and databases at home and abroad use thesaurus. The commonly used thesaurus includes INSPEC thesaurus, Atomic Energy Science and Technology Chinese Thesaurus, National Defense Science and Technology Thesaurus, Geological Chinese Thesaurus and Chinese Thesaurus.
information storage
Information is abstract and must be attached to some kind of carrier to show it. The process of attaching information to the carrier is the process of storing information.
Significance and function of information storage
Information storage refers to the process of recording the processed information on the corresponding information carriers according to certain rules, and organizing these carriers into a systematic retrieval system according to certain characteristics and content attributes. The significance and function of information storage are as follows.
(1) is beneficial to * * *. After information is stored, users can enjoy the information base and use it repeatedly, which improves the utilization rate of information.
(2) Convenient retrieval. Storing the processed information to form an information base provides great convenience for users to retrieve the information they need.
(3) It is conducive to the centralized management of information, increasing the possession of information resources and developing high-level information resources.
In short, when storing information, we must fully consider the convenience and efficiency of retrieval, so as to be orderly, reasonably classified, clear and easy to retrieve.
Main Technologies of Information Storage in 3.4.6.2
Traditional information storage technology refers to paper printing storage technology, while modern information storage technology mainly includes microfilm storage technology, audio-visual storage technology, computer storage technology and optical disk storage technology. They have the advantages of large storage capacity, high density, low cost and convenient access, so they are widely used.
(1) paper storage technology. Paper storage technology is the most commonly used and longest used storage technology. However, it has many shortcomings, such as low density of information storage, large volume, occupying more space, easy burning of paper, moisture, mildew, moth-eaten, weathering and so on. And it is not easy to save.
(2) Microform storage technology. Microform storage technology refers to microfilming the contents of printed matter to film with a camera, and then developing it into microfilm for storage. The main advantages of microfilm storage technology are: ① high storage density, which can save 90% of paper information storage space. ② The storage method is simple, low in cost and economical. ③ Long shelf life, usually up to 50 years in the environment and hundreds of years under standard conditions. 4 Microfilms are loyal to the original and are not easy to make mistakes. Compared with other storage methods, its error rate is 0. ⑤ Using microform technology, the original documents with different specifications can be managed in a standardized way. Microform technology can also be combined with computer technology and communication technology to realize automatic retrieval. Its disadvantage is that it can only be read with the help of a micro-reader or a micro-reader copier, and it cannot be read in comparison, so the storage conditions are very strict.
(3) Audio and video storage technology. Audio-visual storage technology refers to an information storage technology that records and stores information by recording or video recording, including recording storage technology, video recording storage technology and film storage technology.
(4) Optical disk storage technology. Optical disk storage technology is a new storage technology that uses laser and computer to digitize all kinds of information, convert it into optical signals and record it on optical disk to store information. Optical disk storage technology has the following characteristics: ① high storage density and large capacity; (2) low cost and easy to copy; (3) Durable and long storage life. The CD is well sealed and should not be affected by dust, harmful gases and electromagnetic fields. Moreover, the laser is used for non-contact access, and the service life exceeds 10 years. Its main disadvantage is high bit error rate.
(5) Computer storage technology. Computer storage technology refers to the technology of using the internal and external memory of the computer to store information. According to its function in the computer, the memory of the computer can be divided into internal memory and external memory. Among them, memory directly deals with CPU, which is characterized by high speed, small capacity and high price; External memory is mainly the backup and supplement of memory, which is widely used by people. Its characteristics are large storage capacity and low cost, and it can permanently store information offline.
3.4.7 Information analysis
Information analysis is an indispensable part in the process of information organization, and it is a process of prompting the movement law of objective things through known information. Its main task is for information researchers to summarize the original information into brand-new information connotation at a deeper, more comprehensive and more applicable level by means of certain methods and means to meet the needs of users to solve specific topics.
3.4.7. 1 information analysis function
Information analysis has four basic functions: collation, evaluation, prediction and feedback.
(1) Sorting function: collecting and sorting information, making it from disorder to order.
(2) Evaluation function: evaluate the value of information, so as to get rid of the rough and the fine, and discard the false and retain the true.
(3) Prediction function: By analyzing the content of the known revised draft, unknown or future information can be obtained.
(4) Feedback function: review, evaluate, modify and supplement the forecast conclusion according to the actual consumption benefit of users.
Generally speaking, these four basic functions are closely related. Information sorting and evaluation are two basic functions of information analysis, which are to prepare for the realization of prediction and feedback functions. Prediction and feedback are two characteristic functions of information analysis, and they are the further expansion and extension of information sorting and evaluation functions.
Information analysis method
Information analysis method is a tool of information analysis and a means to realize information analysis. Although the content of information analysis is very different, and the scale and scope are also different, their common goal is to focus on specific decision-making issues. Through in-depth analysis and research on the development history and present situation of the problem, it reveals its development law and predicts its development prospect and trend, which determines the common characteristics and attributes of various analysis methods. Information analysis methods mainly include qualitative analysis and quantitative analysis.
(1) qualitative analysis method. Qualitative analysis, that is, logical method, is a research method based on the techniques of logical reasoning and dialectical analysis, and according to the known information, through a series of logical means such as comparison, analysis and synthesis, inductive reasoning, etc., to reveal the development law and causal relationship of things. The advantages of qualitative analysis are rigorous reasoning and strong intuition. However, the main disadvantage is that its conclusion is only a qualitative tendency without quantitative explanation, which is not specific and detailed enough, and can not fully adapt to technical economy or engineering projects, market forecasting and other topics that need quantitative research.
(2) Quantitative analysis method. Quantitative analysis method, that is, mathematical method, is the general name of research methods using basic mathematics, mathematical statistics, applied mathematics and all other mathematical processing and calculation. The outstanding characteristics of these methods are: they can describe things quantitatively and show the specific degree of their development; When studying the relationship between things by mathematical methods, researchers are in direct contact with the homomorphic system of things such as formulas or models, rather than the things themselves. However, the quantitative analysis method also has its applicable conditions and limitations: the boundary conditions used in mathematical calculation are abstracted or assumed by people according to objective things, and whether this abstraction or assumption is reasonable or in line with objective reality should be examined or verified when determining the final conclusion; All kinds of parameter data used in mathematical methods come from objective statistics and subjective evaluation, so the results of mathematical method research only have relative significance in the conclusion of information analysis research; Objective things are often multi-parameter and dynamic complex systems, and the homomorphic system of any objective thing is essentially an approximate, static and simplified system.
It can be seen that logical methods and mathematical methods have their own advantages. It is difficult to separate them completely in information analysis research. Generally speaking, qualitative analysis is the basis of quantitative analysis, which aims to provide arguments for the conclusion of qualitative analysis and confirm the results of qualitative analysis. In specific information analysis activities, they tend to combine the two methods.