In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research dataset (CORD-19) containing over 51,000 scholarly articles, including over 40,...
详细信息
ISBN:
(数字)9781728162515
ISBN:
(纸本)9781728162522
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research dataset (CORD-19) containing over 51,000 scholarly articles, including over 40,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. Medical professional including physicians frequently seek answers to specific questions to improve guidelines and decisions. The huge resource of medical literature is important sources to generate new insights that can help medical communities to provide relevant knowledge and overall fight against the infectious disease. There are ongoing attempts to develop intelligent systems to automatically extract relevant knowledge from many unstructured documents. In this paper, we propose an efficient question answering framework based on automatically analyzing thousands of articles to generate both long text answers (sections/ paragraphs) in response to the questions that are posed by medical communities. In the process of developing the framework, we explored natural language processing techniques like query expansion, data preprocessing, and vector space models early. We show the initial results of an example query answering for the incubation period.
Recently, deep learning has achieved richer research results on entity relationship extraction tasks. Existing methods mainly focus on the character features of the sequence input, without considering that the input s...
详细信息
ISBN:
(数字)9781728196381
ISBN:
(纸本)9781728196398
Recently, deep learning has achieved richer research results on entity relationship extraction tasks. Existing methods mainly focus on the character features of the sequence input, without considering that the input structural features may learn meaningless subsequences. In this paper, we propose an end-to-end joint extraction model based on syntactic tree structure, which can learn character features and sentence structure features at the same time. In the decoding process, we use a tree structure to learn sentence structure features based on character parts of speech. We test our models in two public datasets and our model outperforms the baseline method significantly.
It is widely accepted that driver pathways offer important information for precision and personalized medicine of cancer treatment, hence the problems of driver pathways identification have become a research hotspot i...
详细信息
It is widely accepted that driver pathways offer important information for precision and personalized medicine of cancer treatment, hence the problems of driver pathways identification have become a research hotspot in bioinformatics. In this paper, an improved collaborative mutation driver pathways model ICMDP is proposed by integrating the somatic mutation, copy number variation and gene expression data. The model has two characteristics:(1) each individual pathway has moderate mutual exclusion and high coverage; (2) collaborative driver pathways exhibit significant common mutations in cancer samples, and the genes in collaborative driver pathways are related. Meanwhile, a parthenogenetic algorithm PA-ICMDP is proposed for solving the ICMDP model. Experiments were performed to compare algorithms PA-ICMDP, CoMDP and GAMTOC by using real biological data sets, i.e., the samples of glioblastoma and ovarian. The experimental results indicate that the PA-ICMDP algorithm can not only identify important collaborative driver pathways with higher co-occurrence mutation rates, but also detect more important driver genes such as MET, MDM2, GAB2, TERT, TBX3 and so on. In addition, the EICMDP model and the PA-EICMDP algorithm are put forward by extending the ICMDP model and the PAICMDP algorithm respectively. They can effectively identify other important pathways that collaborate with known driver pathways. The experimental results indicate that the methods presented in this paper may become suitable tools for mining driver genes and driver pathways related to cancer development.
The development of IoT technology promotes the application of medical big data. Among which, early warning of chronic disease is of great significance to disease management. Hypertension is a widespread chronic diseas...
详细信息
ISBN:
(数字)9781728189543
ISBN:
(纸本)9781728189550
The development of IoT technology promotes the application of medical big data. Among which, early warning of chronic disease is of great significance to disease management. Hypertension is a widespread chronic disease. Preventing hypertension can effectively promote health conditions and reduce early mortality rate. In this paper, we attempt to predict the onset age of hypertension using CatBoost algorithm for supporting health management decision making based on medical big data. Firstly, the features that are highly associated with onset age of hypertension are analyzed by maximum information coefficient, and then those chosen features are used as the inputs of CatBoost algorithm to construct the prediction model of onset age of hypertension. 2363 sample data collected from a hospital in Beijing were used to verify the effectiveness of this approach. For the testing set, the RMSE was 5.38 and the MAPE was 9.42%, which outperforms linear regression model, SVM model and artificial neural network model. The experimental results show that the prediction model can predict individual's onset age of hypertension from current health indicators and provides a novel idea for early warning of hypertension.
Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in...
详细信息
ISBN:
(数字)9781728129037
ISBN:
(纸本)9781728129044
Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in almost all distributed ML systems, such as Spark MLlib, Petuum, MXNet, and TensorFlow. However, current implementations often incur huge communication and memory overheads when it comes to large models. One important reason for this inefficiency is the row-oriented scheme (RowSGD) that existing systems use to partition the training data, which forces them to adopt a centralized model management strategy that leads to vast amount of data exchange over the network. We propose a novel, column-oriented scheme (ColumnSGD) that partitions training data by columns rather than by rows. As a result, ML model can be partitioned by columns as well, leading to a distributed configuration where individual data and model partitions can be collocated on the same machine. Following this locality property, we develop a simple yet powerful computation framework that significantly reduces communication overheads and memory footprints compared to RowSGD, for large-scale ML models such as generalized linear models (GLMs) and factorization machines (FMs). We implement ColumnSGD on top of Apache Spark, and study its performance both analytically and experimentally. Experimental results on both public and real-world datasets show that ColumnSGD is up to 930× faster than MLlib, 63× faster than Petuum, and 14× faster than MXNet.
Electricity is one of the important needs in human production and life. The prediction of user power consumption can help power supply enterprises to analyze users' electricity consumption behavior, provide person...
详细信息
ISBN:
(数字)9781728181417
ISBN:
(纸本)9781728181424
Electricity is one of the important needs in human production and life. The prediction of user power consumption can help power supply enterprises to analyze users' electricity consumption behavior, provide personalized services for users and formulate effective peak load shifting power supply scheme, which is very important for decision-making and demand response of power management side. As the daily electricity consumption data of users is nonlinear and nonstationary time series data, coupled with its susceptibility to climate change, social activities and other random factors, making electricity consumption forecast is a very challenging demand. At present, many deep learning models, such as recurrent neural network (RNN) and long short-term memory (LSTM) have been applied in electricity consumption forecasting and achieved good results. However, the direct use of these models cannot fully take into account the nonstationary characteristics of electricity data, and there is still room for improvement in the prediction effect. In this paper, a hybrid model of empirical mode decomposition (EMD) and gated recurrent unit (GRU) is proposed to predict user electricity consumption. First, the original nonstationary electricity consumption time series data is decomposed into multiple stationary component sequences through EMD, then each component is predicted through a multi-layer GRU network, and finally the prediction results of each component are combined to obtain the final Forecast results. Experimental results show that, compared with the direct use of LSTM, the proposed model can effectively reduce the error, achieve a better fitting effect, and improve the training efficiency to a certain extent.
With the development of Internet technology, information management has shown a spurt of growth. In real-life applications, information usually includes spatial, temporal, or spatiotemporal features. Spatiotemporal da...
详细信息
ISBN:
(数字)9781728169323
ISBN:
(纸本)9781728169330
With the development of Internet technology, information management has shown a spurt of growth. In real-life applications, information usually includes spatial, temporal, or spatiotemporal features. Spatiotemporal data has temporal and spatial attributes, and these attributes are often fuzzy. Due to the great significance of fuzzy spatiotemporal data management, how to query fuzzy spatiotemporal data efficiently and effectively has become an important research issue. In that case, this paper formally proposes a new implementation method, which is the query processing of fuzzy spatiotemporal data. In view of the advantages of the most advanced mapping method (R2RML) in data transformation, three algorithms are proposed to transform fuzzy spatiotemporal data from relational database into fuzzy RDF data based on R2RML. On this basis, according to the characteristics of fuzzy RDF data, we give three different fuzzy quantifiers (extreme fuzzy quantifier, range fuzzy quantifier, degree fuzzy quantifier) to represent fuzzy spatiotemporal RDF data. Since SPARQL plays an important role in querying RDF data, it is used for the query of fuzzy spatiotemporal RDF data. In addition, three kinds of fuzzy quantifiers are designed, and the experimental results show the superiority of this method by analyzing experiments in the aspects of the recall and precision.
The Internet of Things has gained considerable attention due to its potential applications in multiple domains. However, some deployment environments may be hostile and this may affect the quality of data (QoD) and al...
详细信息
ISBN:
(数字)9781728192901
ISBN:
(纸本)9781728192918
The Internet of Things has gained considerable attention due to its potential applications in multiple domains. However, some deployment environments may be hostile and this may affect the quality of data (QoD) and alter its accuracy. In order to ensure a high level of reliability, an IoT system should be able to clean its own sensed data by discarding those instances that are erroneous or incoherent. To achieve the data quality improvements, this paper suggests a new approach based on Artificial Neural Network (ANN). The proposed scheme can prematurely and efficiently detect outliers before forwarding them to a central processing unit. The performance of this proposed solution is validated through simulations, using a real dataset, and compared with other well-known models. Our findings demonstrate that the proposed approach outperforms the compared models in terms of accuracy, f-score, recall and precision metrics.
The degree of polymerization (DP) is the most direct parameter to characterize the aging condition of the oilpaper insulation. Recently, near infrared spectroscopy (NIRS) is used to evaluate the DP of oil-paper insula...
详细信息
ISBN:
(数字)9781728155111
ISBN:
(纸本)9781728155128
The degree of polymerization (DP) is the most direct parameter to characterize the aging condition of the oilpaper insulation. Recently, near infrared spectroscopy (NIRS) is used to evaluate the DP of oil-paper insulation for it is a rapid and non-destructive detection compared with the traditional viscometric method. However, the real applications of NIRS are constrained by the poor generalization ability of current prediction algorithms. This paper studies the intelligent algorithms for precisely predicting the degree of polymerization (DP) of the oil-paper insulation. The radial basis function (RBF) neural network is adopted due to its fast learning speed and strong nonlinear mapping capability. The basic principles of RBF neural network are introduced and the model for the diagnosis is trained. Especially, considering the noise at the both ends parts of the spectrum produced by the portable near-infrared, these spectral data are removed by the algorithm. A numerical example is presented to demonstrate the capability of the proposed method and the results indicate that the aging state of oil-paper insulation can be diagnosed with a trained RBF neural network of high precision. Our proposed model is applied on the Oil-impregnated paper bushing and the application results show that this method could accurately assess the aging condition of the oil-paper insulation.
With the rapid growth of online banking and shopping, many companies deploy their online transaction system and thus raising many issues of fraud online credit card transactions. In recent years, several studies have ...
详细信息
ISBN:
(数字)9781728159027
ISBN:
(纸本)9781728159034
With the rapid growth of online banking and shopping, many companies deploy their online transaction system and thus raising many issues of fraud online credit card transactions. In recent years, several studies have developed some data mining based method to overcome this problem. However, most of the studies used a small amount of data and feature engineering methods. Moreover, the learning models used by them are too weak to fit the large scale of data. This paper expands fraud detection strategy and proposed a detection algorithm using lightgbm. The dataset is IEEE-CIS Fraud Detection dataset provided by Vesta Corporation. The experiments indicated that our method outperformed the other classical methods like Support Vector Machine, Random Forest and Xgboost. Moreover, it also shows the feature importance of our feature engineering, which is valuable for feature selection and performance tuning.
暂无评论