1. Virtual Medical Research Case Based on Big Data Mining
The development of data mining today should be the era of "big" data mining according to the current concept. Let me talk about a few related cases first.
1. 1 virtual clinical trial-big data collection
Let's look at such a case first. 20 1 1 In June, Pfizer Pharmaceutical Co., Ltd. announced a "virtual" clinical study, which was a pilot project approved by the US Food and Drug Administration, with the acronym "REMOTE". The "remote" project is the first clinical study in the United States in which patients only need to use mobile phones and the Internet instead of going to the hospital repeatedly. The goal of this project is to determine whether this "virtual" clinical research can produce the same results as traditional clinical research. However, traditional clinical research requires patients to live near the hospital and go to the hospital or clinic regularly for initial examination and multiple follow-up examinations. If this project is effective, it may mean that patients all over the United States can participate in many medical research in the future. In this way, the groups that were not fully represented in the original scientific research project will be able to participate, the speed of data collection will be greatly accelerated, the cost is likely to be greatly reduced, and the probability of participants quitting will be greatly reduced.
From the above example, we can see that a large number of patients' clinical data can be collected by using the Internet, which is far greater than the number of traditional clinical research samples, and some clinical data may come from more convenient wearable health monitoring equipment. If such research is carried out, the efficiency of scientific research and the credibility of achievements can be significantly improved under the conditions of rigorous scientific research design, effective implementation of quality standards and effective control of various errors. As the chief medical officer of Pfizer, Freda? Lewis-Hall said: "Let more different people participate in the research, which may promote medical progress and bring better curative effect to more patients."
1.2 virtual drug clinical trial-big data mining
Let's look at another case. 1992, the antidepressant paroxetine (paroxetine) was approved for marketing; From 65438 to 0996, the cholesterol-lowering drug Pravaz alcohol was officially listed. Studies by two pharmaceutical companies have proved that each drug is effective and safe when taken alone. However, no one knows whether it is safe for patients to take two drugs at the same time, and even few people have thought about it. Researchers at Stanford University in the United States used data mining technology to analyze the electronic medical records of tens of thousands of patients, and soon found an unexpected answer: patients taking two drugs at the same time had higher blood sugar content. This has a great impact on diabetic patients, and high blood sugar is a serious threat to their health! Scientists also look for hidden rules by analyzing blood sugar test results and drug prescriptions.
For a single doctor, the number of patients taking both drugs at the same time is very limited. Although there may be a few patients with diabetes whose blood sugar rises inexplicably, it is difficult for doctors to realize that this is caused by patients taking paroxetine and puranol at the same time. Because this is a hidden law hidden in big data, it is difficult for a single doctor to reveal this law if no one specializes in the safety of paroxetine and pragu. However, there are thousands of clinical drugs. How can we study the safety and efficacy of any combination of two or three drugs? Data mining is likely to become an effective, fast and active method to explore the combined application of multiple drugs!
Researchers don't have to call patients for clinical trials, which will cost too much. The popularization of electronic medical record and its computer application provides new opportunities for medical data mining. Scientists are no longer limited to the traditional research by convening volunteers, but choose more data from real-life experiments, such as a large number of daily clinical cases, and carry out virtual research. These experimental data are not from the planned project, but are kept in the medical records of many hospitals.
Similar to this case, the application of data technology enables researchers to discover unforeseen problems when a drug is approved for marketing, such as how a drug may affect a specific group of people. In addition, data mining of medical records will not only bring benefits to research, but also improve the efficiency of medical service system.
1.3 virtual drug target discovery-knowledge discovery
Let's look at this kind of research again. Usually, the process of new drug research and development is relatively long, with huge investment and high risk. According to statistics, the average research and development time of new drugs is as long as 15 years, and the average cost exceeds 800 million dollars. However, due to the poor efficacy and side effects of drugs, the research and development of many drugs often fails in the clinical stage, resulting in huge economic losses. As the source of drug research and development, the discovery and identification of drug targets plays a vital role in the success rate of drug research and development. With the continuous development of bioinformatics technology, the data of protein omics and chemical genetics are increasing day by day. The application of data mining technology combined with traditional biological experiment technology can provide new technical means for the discovery of new drug targets and new methods for the identification and prediction of targets. Building a database of drug targets, using intelligent computing technology and data mining technology to deeply explore the existing drug target data to discover new drug targets is such a research, which is also called drug target knowledge discovery.
The discovery of traditional drug targets is usually achieved through a large number of repeated biochemical experiments, which is not only high in cost, low in efficiency and low in success rate, but also difficult to grasp the direction just like a blind person touching an elephant. Data mining, an automatic, active and efficient exploration technology, can be used to discover virtual drug targets, which not only greatly speeds up the process of drug target discovery, but also greatly reduces the number and cost of biochemical experiments and improves the success rate of traditional biochemical experiments.
2. The application of data mining in virtual medical research.
In the era of big data, R&D pharmaceutical is facing more challenges and opportunities. In order to save R&D cost, improve the success rate of new drug research and develop more competitive new drugs, data mining technology can be used to carry out virtual medical research and drug research. The application of data mining in virtual medical research can be summarized as follows.
2. 1 Help pharmaceutical companies reduce R&D costs and improve R&D efficiency through predictive modeling. The model is based on the data set before the drug clinical trial stage and the data set in the early clinical stage, and can predict the clinical results as soon as possible. Evaluation factors include product safety, effectiveness, potential side effects and overall test results. Predictive modeling can reduce the research and development costs of pharmaceutical products companies. After predicting the clinical results of drugs through data modeling and analysis, the research on suboptimal drugs can be suspended or the expensive clinical trials on suboptimal drugs can be stopped.
2.2 By mining patient data, we can evaluate whether the recruited patients meet the test conditions, so as to speed up the clinical trial process and put forward more effective clinical trial design suggestions. For example, the patient population is clustered by clustering method to find out the characteristics such as age, gender, illness and laboratory indicators. And judge whether the test conditions are met. According to these characteristics, the control group can be better established.
2.3 Analysis of clinical trial data and patients' medical records can determine more indications of drugs and find side effects. After analyzing clinical trial data and patient records, drugs can be repositioned or marketed for other indications. Mining experimental data by correlation analysis and other methods may find some unexpected results, which greatly improves the utilization rate of data.
2.4 Real-time or near-real-time collection of adverse reaction reports can promote pharmacovigilance. Pharmacovigilance is the security system of listed drugs, which monitors, evaluates and prevents adverse drug reactions. Through big data mining methods such as clustering and correlation, we can understand the adverse drug reactions, drug use performance, diseases and adverse reactions, and whether they are related to certain chemical components. For example, cluster analysis of adverse reaction symptoms, correlation analysis between chemical components and adverse reaction symptoms, etc. In addition, in some cases, clinical trials have hinted at some situations but there are not enough statistical data to prove them. Now the analysis based on big data of clinical trials can give evidence.
2.5 targeted drug research and development: develop personalized drugs by analyzing large data sets (such as genome data). This application investigates the relationship among genetic variation, susceptibility to specific diseases and response to specific drugs, and then considers individual genetic variation factors in the process of drug development and medication. In many cases, patients use the same drug regimen but have different curative effects, partly because of genetic variation. Develop different drugs for different patients with the same disease, or give different usages.
2.6 Explore the combination of chemical composition and pharmacological action of drugs to inspire R&D personnel. For example, for the research and development of traditional Chinese medicine, data mining is used to analyze the prescriptions and symptoms of traditional Chinese medicine, explore the relationship between prescriptions and symptoms, and analyze the classification characteristics from the aspects of efficacy, meridian tropism, medicinal properties and medicinal taste.
3. Virtual drug clinical trial analysis system
Nowadays, more and more clinical scientific research and drug clinical trials, through strict conditional screening, extract data from big data generated in daily clinical work. Just like the cases mentioned in this paper 1. 1 and 1.2, the so-called virtual drug clinical trial is to collect more extensive clinical data from a large number of hospital electronic medical records, screen strict conditions in advance according to design requirements, and conduct clinical trials. Although it is a virtual method rather than a traditional method, the clinical trials of such drugs have wider samples, lower cost, higher efficiency and more research results. The method of virtual research can completely replace some traditional drug clinical research, and can also be used as a preliminary test or exploratory research of some traditional drug clinical research, so that the real drug clinical research work is more, faster, better and less expensive. Now let's see how the virtual drug clinical trial analysis system works.
3. 1 Basic ideas of virtual drug research
1, the construction of drug clinical trial data warehouse, fully integrating and accumulating clinical data and drug application data. 2. Design and selection of samples in observation group and control group in drug clinical trials. 3. Use data mining technology to explore the effects and side effects of drugs on disease treatment. 4. Infer and evaluate the effect of drug clinical trials by statistical techniques.
3.2 The establishment of drug clinical data warehouse
There are two ways to build a drug clinical trial data warehouse. One way is to customize and collect relevant data through the design of classic drug clinical trials. The traditional method is mainly paper document recording, and there are also special data entry software. The data collected by this method is designed in advance, which directly forms the special data of drug clinical trials, but usually the sample data is not too large; The other is to extract, convert and load a large number of historical clinical drug use data in the hospital, and then fully integrate other accumulated clinical data and drug application data to form a drug clinical trial data source to provide support for the generation of drug clinical trial data. Such sample data may be very large, and the method we will demonstrate later is to use these data to screen and analyze "virtual" samples.
3.3 Sample design of drug clinical trials
According to the needs of drug research, there are many designs of drug clinical trial samples, such as single-factor single-level design, single-factor two-level design, single-factor multi-level design, paired design, block design, repeated measurement design and so on. The following two-factor block design is taken as an example to introduce sample screening. This example is just to demonstrate the method, regardless of the strict medical professional significance.
The disease in this study is atherosclerotic heart disease, and the treatment factor is drug use. * * * There are three drugs, namely betaloc, Novolin and isosorbide dinitrate. The factor of block group is age, which is divided into three age groups. The observation index was blood sodium. Our scientific research design is based on "three elements and four principles" for data screening. The so-called "three elements" are the research population, processing factors and observation objects. The so-called four principles refer to the principles of randomness, comparability, repeatability and balance. According to the input conditions shown in the following figure 1, the data set can be filtered out, and then statistical analysis can be carried out with statistical analysis tools.
3.4 Drug clinical data mining
The application of data mining technology can not only improve the utilization rate of drug clinical data, but also explore and discover new positive and negative effects in drug clinical application. Using various data mining methods to analyze clinical trial data and patient electronic data can determine more drug indications and find unknown side effects. After mining and analyzing clinical trial data and patient records, drugs can be repositioned or promoted for other indications. Some unexpected results may be found by mining drug test data, which greatly improves the application benefit of data.
For example, we use data mining to study the influence of drugs on laboratory indicators. Exploring and discovering the positive and negative effects of drugs in clinical application can be carried out by observing many medical characteristics and physiological indexes of patients before and after taking drugs. Observing more objective laboratory indexes is one of the necessary designs for many drug research. The following is the application study of betaloc in the treatment of coronary heart disease. We used data mining technology to analyze the influence of the change of betaloc blood concentration on various laboratory indexes of patients, as shown in Figure 2 below, which shows the influence results of some laboratory indexes.
The above results need to be discussed with clinical medical staff and drug researchers. After excluding all kinds of human factors and objective factors of commercial system, we can find the previously unknown effects of betaloc on patients' physiological indexes, some of which may be positive and some may be negative in medicine.
3.5 Statistical analysis design
The statistical analysis module of virtual drug clinical trial analysis system includes statistical analysis methods commonly used in drug research and development, such as t-test, variance analysis, correlation analysis, regression analysis, nonparametric test and so on. The design idea is based on statistical thinking. Firstly, the data is verified, and the statistical analysis method is selected according to the verification results. We take repeated measurement design as an example to illustrate.
The disease in this study is atherosclerotic heart disease, the treatment factor is the medication of betaloc, and the observation index is the influential blood potassium index found by us from data mining. We can use the module provided in 3.3 to extract and analyze the filtered samples, or we can directly select the required data from this module for analysis. There are two methods for repeated measurement analysis, one is hotelling T2 test, and the other is analysis of variance. The system provides these two statistical testing methods.
Figure 3 below shows some sample data:
Here, we only observe the output of the analysis of variance method, as shown in Figure 4 below.
As can be seen from the figure, according to the value of P, the therapeutic factor "betaloc" has an effect on blood potassium, and the measurement time has an effect on blood potassium, and the therapeutic factor and the measurement time have an interaction. Therefore, the results obtained by our data mining application have been verified.
4. Application of data mining in research and development of traditional Chinese medicine
Above, we focus on the research and application of western medicine as an example to illustrate the research method of virtual medicine characterized by data mining. In fact, data mining and virtual drug research are also very suitable for the research of Chinese medicine, because Chinese medicine itself is a medicine. After thousands of years of continuous exploration, accumulation and verification, it has a huge knowledge system and a complete theoretical system, but we still need to apply modern knowledge to continuously understand, excavate, improve and apply it, so as to better combine with modern science. And data mining is a powerful tool to explore and explain the mysteries of traditional Chinese medicine!
Many domestic units have also made some local attempts to mine data of traditional Chinese medicine. The attempts of these data mining in TCM research are summarized as follows: 1, text data mining in TCM prescriptions; 2. The excavation of "effective components"-monomers or chemical components that play a key role in "pharmacology"; 3. Research on data mining and compatibility law of traditional Chinese medicine prescriptions: 4. Data mining of the relationship between material basis and efficacy of prescription compatibility, such as symptoms and symptoms; 5. Explore the relationship between the compatibility dose of prescription and the utility level of prescription (dose-effect relationship and model); 6. Explore the relationship between the property theory of traditional Chinese medicine and the effective components of traditional Chinese medicine; 7. Mining the correlation between drugs in prescriptions; 8. Mining the implicit similarity of similar diseases; 9. Dig and study the similarities and differences of different prescriptions of the same disease. 10. Data mining is used to classify and study inaccurate diseases.