When Tang poetry and Song lyrics meet big data

Text/Dai Yue

Interpreting Tang poetry and Song Ci from the perspective of data can actually lead to conclusions beyond imagination. These are two data news works jointly launched by the CAD&CG (Computer Aided Design and Graphics) State Key Laboratory of the School of Computer Science at Zhejiang University and Xinhuanet. The research content of "Group Portraits" and "Song Ci is lingering, where can we draw the human world".

What is data journalism? Data journalism, also known as data-driven journalism, refers to a new news reporting method based on data capture, mining, statistics, analysis and visual presentation. If unprocessed data is compared to fresh ingredients, then data journalism is a carefully cooked dish presented to readers. "There are a thousand Hamlets in the hearts of a thousand readers", and everyone can taste it differently.

In order to have a deeper understanding of the cooking methods of "dishes", I interviewed Professor Chen Wei, the general person in charge of the two works, and Zhang Wei, the specific person in charge of the project, at Zijingang Campus of Zhejiang University. Conversations with the two teachers made the seemingly mysterious "cooking method" gradually clear.

Scientific and rigorous "pictures of ladies": Portraits of female poets in the Tang Dynasty

Analysis of the Tang Poetry Project "I have tenderness like water, but also heroic passion - Portraits of female poets in the Tang Dynasty" It collects about 55,000 Tang poems and uses various charts to visually present the creation of female poets in the Tang Dynasty.

The first chart is an overview of the number of works created by poets in the Tang Dynasty, including 1, 2, 3~5, 5~10, 10~50 and more than 50 works. The number of poets is expressed in the form of a dot matrix. Each point represents a poet. Hovering the mouse over the points will display the poet's name and number of works. The poet's gender is distinguished by gray and vermilion. At different stages, portraits are used to highlight famous representative poets. For example, in the stage of "3 to 5 poems", Zhang Ruoxu, who has 3 surviving works, is one of the representatives, while in the stage of "more than 50 poems", Zhang Ruoxu, who has 3 existing works, is one of the representatives. Bai Juyi, who composed 3009 songs, is the leader among them. In the face of simple numbers, we may not be able to keenly perceive the meaning behind it. However, in the dot matrix, compared with the many poets who only left solitary poems, we can feel the Xiangshan layman who has 3009 works handed down to the world. What an amazing influence it had at that time and in later generations. More than 3,000 poems have been delivered to us after more than a thousand years of rough waves. What a rich cultural and historical treasure this is.

The second chart is "Panorama of Female Poets in the Tang Dynasty", which divides the Tang Dynasty into four stages: the early Tang Dynasty, the prosperous Tang Dynasty, the middle Tang Dynasty and the late Tang Dynasty. The dot matrix expression is also used here, but cinnabar-colored flowers are used instead of "dots". Flowers of different shapes represent the different identities of female poets. Some of them are court poets, such as Shangguan Wan'er, and some are married women of scholar-officials. Some are folk women or singing girls, and those whose works have survived the most and are also the most famous, such as Xue Tao, Li Ye, Yu Xuanji, etc., are marked with blooming lotus flowers. People often use flowers to describe beauty, and these vermilion flowers seem to have inherited the talent and beauty of these women, blooming beautifully and enthusiastically on the scroll.

In our impression, the prosperous Tang Dynasty was the era when poets emerged in large numbers. At that time, great poets such as Li Du and others appeared, and it is believed that the number of female poets at this time should be the largest. But this panoramic picture gives a different answer - there were only slightly more female poets in the prosperous Tang Dynasty than in the early and middle Tang Dynasties. On the contrary, the number of female poets in the late Tang Dynasty was the largest, and there were only a few female poets in the middle and prosperous Tang Dynasties. twice. After discovering this fact that was so contrary to my understanding, I began to try to find an explanation for this. Based on what I have learned before, I think it may be because the social situation in the late Tang Dynasty was deteriorating, and the poetry style was also more feminine and delicate, which was consistent with the characteristics of women, which led to a sharp increase in the number of female poets. In the Tang Dynasty, which we consider to be the heyday of poetry, the style of poetry was grand and majestic, which is probably not in line with femininity.

The third chart is the "Word frequency chart of poets' poems". The size and depth of words show how often they are used. The most frequent word "lovesickness" in the works of female poets can illustrate the consistent style of female poets, which is not much different from our ordinary cognition. Female poets often express the feelings of "lovesickness" and "loneliness" in their poems. Like male poets, "wind" and "人" are the most frequent words. Compared with men, female poets prefer to use soft images such as "flower", "moon" and "spring", and through these images , the unique inner experience of women is evident.

Finally, there is the "Social Diagram of Female Poets in the Tang Dynasty", which selects the most representative female poets and expresses their social relationships in the form of circles and lines. The thickness of the lines represents The depth of social relationships. From this picture, we can find that two famous female poets, Xue Tao and Li Ye, have collaborated on poetry and singing with Liu Yuxi. Perhaps we don't know whether these two talented women, who were also among the four great female poets of the Tang Dynasty, had ever crossed paths, but at this moment they are connected through distant time and space.

Later, there were separate social relationship diagrams of Xue Tao and Li Ye.

In Li Ye's social graph, a triangle is formed between the poet, Lu Yu, and Jiao Ran. It can be seen that this is also a "small circle" among poets. The three of them are all very interested in tea science, Buddhism, etc., and They also had mutual benefits. The social graph of "girls school book" Xue Tao has more similar circles and is larger. The largest ones include Yuan Zhen, Bai Juyi, Liu Yuxi, Yan Shou and others. Most of them know each other or are familiar with each other, as if they are an ancient version of the "circle of friends". "Same friends", the intricate network of relationships hidden in the classics is embodied in a simple and clear social graph, and it seems that the features of the ancients that were obscured by time became clear in an instant.

The background of the news page simulates a yellowed ancient scroll, coupled with quaint pattern design and font design, forming a "picture of a lady" intertwined with rationality and rigor. Although no portrait appears, Through the bridge of time built by data, we seem to be able to glimpse the shy shadow of beauty through thousands of years of dust.

Regarding the appearance design of data news works, Mr. Zhang Wei, the specific person in charge of the project, said that they made more than two sets of plans for each chart, and only after continuous screening and consideration did we come up with what we have. See the appearance of this set. Web design is just like the ancient painters making paintings. Only by discussing, polishing, and polishing can the viewer's spiritual excitement be aroused in a beautiful manner.

The emotional expressions of poets in the Song Dynasty

The Song Ci project "Song Ci is lingering, where can we draw the human world", using the "Complete Song Ci" as a sample, from nearly 21,000 lyrics and 1,330 poets Rich graphs are presented in the huge data. Different from the exquisite and classical atmosphere of Tang poetry, the appearance of Song poetry has a hazy freehand style, and the charts also use ink elements in many places, rendering precise data with poetic beauty.

The whole work is divided into three sections, "Traveling through thousands of rivers and mountains", "Every plant and tree has feelings, words are life", "Spring breeze turns into rain, and it lasts forever". In the first section "Traveling Through Thousands of Rivers and Mountains", the first thing that catches the eye is a territorial map of the Song Dynasty. The gray dots represent the places that the poets have visited. The larger the gray dots, the more people have visited. Gray dots densely cover most of the territory of the Song Dynasty. Except for the Qinghai-Tibet Plateau, which is rarely visited, there are also footprints of poets in the north and south of the Tianshan Mountains. Hovering the mouse over it will display the poet's traveling route. The longest span stretches from the northernmost end of the territory to the southernmost coastal area. The route Confucius traveled around the world was actually only from Henan to Shandong. However, Confucius traveled for more than ten years on the route that can be reached in a few hours by high-speed train today. This route that ran through the territory of the Song Dynasty from south to north probably took a poet's lifetime.

Then there is a panorama of the poets of the Song Dynasty. This panorama adopts the form of a line chart. The horizontal axis is the various historical stages from the Northern Song Dynasty to the Southern Song Dynasty, and the vertical axis is the number of poets' works. Each line represents a poet. The horizontal line represents the poet's civilian period, while the upward broken line represents the poet's official career. The gray and brown lines distinguish the graceful and bold styles. Among the many poets, Lin Bu, who lived his whole life as a commoner, and the female poet Li Qingzhao, both have a horizontal line. The lines of other poets have ups and downs, and the joys and sorrows of their lives are all wrapped up in a simple line. The lines outlined make people sigh.

In the second section "Every plant and tree has feelings, words are life", the frequency of words in "Complete Song Ci" was firstly calculated. The most frequent words are "dongfeng", "where" and "human world". The poverty and weakness of the Song Dynasty and the Jingkang Incident intensified the sense of wandering in the hearts of the poets. They seemed to have been searching, no matter where they were sober tonight. ", or the "where to return" written by Lu Fangweng in the poem, are all questions to the soul.

The second chart is the statistics of common images and the emotions expressed by famous poets in the Song Dynasty. The five emotions of joy, anger, sadness, joy, and thinking are represented by different colors. Each image has its own characteristics. A pie chart showing the number of emotional expressions carried. Hovering the mouse over the poet's name will display the proportion of the number of times they used imagery to express emotions. Wang Guowei once said, "When I look at things, all the old things have my color." Xin Qiji, a representative of the Bold and Unconstrained School, often used images such as "wine" and "moon", which reminded people of the cold moon at the border and the sad songs of cooking wine. Yan Jidao, the son of Yan Shu, The style of his poems is graceful. His family fell into decline when he was young, and he has been homeless all his life. In his poems, he often appears as an image of a downcast prince and grandson, who often spends time in the "small building". His famous line is "Dancing low in the willow tower to the heart of the moon, singing as much as the peach blossom fan" "The bottom wind" can express the mood of his words.

I was curious about how to calculate the emotions carried in images. Professor Chen Wei told me that the calculation is based on existing algorithms and models. For a problem of ten years, there is already a standard method." "For us, this is something in the textbook." It turns out that the combination of literature and computers has not only emerged this year, but has already produced progress beyond our imagination.

In the last section, "Spring Breeze Turns into Rain, Remains New Through Time", the flat and oblique lines representing the lyrics of each word plate are marked with line segments of different lengths, accompanied by human voice recitation, and the words are restored to their original musical functions. , the rhythm originally hidden behind the words is intuitively revealed. Perhaps thousands of years apart, the guests of the band are enjoying the same lyrics and savoring the same complex and long emotions as us.

The collision of digitization and Ci studies has introduced a "quantitative" way of thinking

One of the major effects of the combination of Tang poetry and Song Ci with big data is the improvement of efficiency. Each beautifully crafted chart lays out the key information in front of you, so you can pick it up as needed. I couldn't help but sigh, if the information I needed for my previous assignments could be presented in this way, I would be able to save a lot of time.

Professor Chen Wei introduced that before the popularization of big data, humanities scholars had to rely on consulting physical classics to obtain information and read a book from beginning to end. After the advancement of science and technology, many classics have been scanned electronically. version, but you still need to search manually and read all the text on the computer. But big data has brought about changes. "Suppose I can extract its core and key features and information, use computer modeling to model it, and then present it on the screen. This is the key information about these people, and who they are related to." , What works did he have, and what was his living environment like, which greatly improves efficiency."

As a humanities major, I often want to understand the social environment in which an ancient poet lived during a certain period of time. , conducting a "naked search" on many photocopied versions of historical records and poet chronicles. Looking at the small vertical characters in traditional Chinese for a long time makes one's eyes dazzled.

I thought of an assignment I had done on the changes in the image of nominal objects in Tang and Song poetry, and I chose the image of "hairpin". When exemplifying lyrics containing this image, there is already a ready-made database of Tang and Song lyrics, which contains a considerable number of Tang and Song lyrics. I only need to enter keywords such as "Chai", "Yin Chai" and "Feng Chai" to easily Get related works quickly and easily. When investigating the changes in the material and shape of the "hairpin" itself, some of the catalogs and illustrations of related ornaments that I found did not even have catalogs and page numbers. I could only read them page by page with traditional Chinese vertical characters to see what might be useful. Information can only use the markup function that comes with PDF reading software. A search takes a long time, but the information obtained is far from equal to the time spent. Sometimes after flipping through a book with hundreds of pages, you can only get a few sentences of useful information. From this point of view, the popularization of big data is indeed an urgent need. It has also brought good news to researchers in the humanities and social sciences, eliminating a lot of complicated and inefficient desk work.

In addition to greatly improving the efficiency of scientific research, big data also provides a "quantitative analysis" thinking perspective for research.

The data-based study of Tang and Song poetry is a research trend that began to emerge in the 1990s, and is closely related to the development of data technology in the 1990s. The collision of digitization and ci studies has introduced a "quantitative" way of thinking, such as how to determine the popularity of a ci in the Song Dynasty. This was difficult to measure in previous studies. Even if it could be determined qualitatively, it was "empty talk". "Based", there is no corresponding evidence. But big data can solve this problem. By counting the number of times this poem has been included in Song Dynasty poetry, we can roughly get a quantitative result of its popularity. Statistics alone make results more precise and more convincing.

Although big data can bring many benefits, the intersection of big data and literary research also creates some issues that require attention. In a professional class, the teacher once gave an example of big data research. In "Quan Jin Yuan Ci", there are two most frequently used tones, the highest being "Black Paint Crossbow", followed by "Magnolia Slow". "Magnolia Slow" is a familiar tune to us, but "Black Paint Crossbow" is unheard of for me, who is not very professional. "Black Lacquer Crossbow" has almost no works handed down from the Song Dynasty, but why does it become the most frequently used tone? It turns out that "Black Lacquer Crossbow" entered Yuan dramas and became a tune in the Yuan Dynasty, that is to say, it is a kind of melody that has been transformed into a tune, which can be called a tune. This reflects the problem. When using big data to study words, sample issues need to be paid attention to. For example, when studying the most frequently used word tones in "Quan Jin Yuan Ci", the most frequently used word tones like "Black Paint Crossbow" should not be included in the sample. The sampling problem has become a "stumbling block" to the big dataization of word studies.

In addition to the problem of known samples, the constant changes in samples in the field of vocabulary research also trouble scholars. Remaining poems from Tang and Song dynasties are constantly being discovered, and samples are constantly being added. Compared with the limited number of existing Tang and Song poems, the number of Ming and Qing poems is as large as the sand of the Ganges, almost inexhaustible. Such a huge sample itself is a headache.

The cooperation between the humanities and big data has made gratifying developments, but there is still a long way to go.

As a humanities major, I also look forward to the day when the beautiful picture becomes a reality.

The article is selected from "College Students"