Objective: Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microR...
详细信息
Objective: Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machinelearning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. Methods: FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). Results: In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF;75%, SVM;75%, FSCOX_median;and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF;63.64%, SVM;72.73%, FSCOX_median;and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of m
Biochemical and structural leaf properties such as chlorophyll content (Chl), nitrogen content (N), leaf water content (LWC), and specific leaf area (SLA) have the benefit to be estimated through nondestructive spectr...
详细信息
Biochemical and structural leaf properties such as chlorophyll content (Chl), nitrogen content (N), leaf water content (LWC), and specific leaf area (SLA) have the benefit to be estimated through nondestructive spectral measurements. Current practices, however, mainly focus on a limited amount of wavelength bands while more information could be extracted from other wavelengths in the full range (400-2500 nm) spectrum. In this research, leaf characteristics were estimated from a field-based multi-species dataset, covering a wide range in leaf structures and Chl concentrations. The dataset contains leaves with extremely high Chl concentrations (>100 mu g cm(-2)), which are seldom estimated. Parameter retrieval was conducted with the machinelearning regression algorithm Gaussian Processes (GP), which is able to perform adaptive, nonlinear data fitting for complex datasets. Moreover, insight in relevant bands is provided during the development of a regression model. Consequently, the physical meaning of the model can be explored. Best estimates of SLA, LWC and Chl yielded a best obtained normalized root mean square error of 6.0%, 7.7%, 9.1%, respectively. Several distinct wavebands were chosen across the whole spectrum. A band in the red edge (710 nm) appeared to be most important for the estimation of Chl. Interestingly, spectral features related to biochemicals with a structural or carbon storage function (e.g. 1090, 1550, 1670, 1730 nm) were found important not only for estimation of SLA, but also for LWC, Chl or N estimation. Similar, Chl estimation was also helped by some wavebands related to water content (950, 1430 nm) due to correlation between the parameters. It is shown that leaf parameter retrieval by GP regression is successful, and able to cope with large structural differences between leaves. (C) 2014 Elsevier B.V. All rights reserved.
The solubility of recombinant protein expressed in Escherichia coli often represents the production yield. However, up-to-date, instances of successful production of soluble recombinant proteins in E. coli expression ...
详细信息
The solubility of recombinant protein expressed in Escherichia coli often represents the production yield. However, up-to-date, instances of successful production of soluble recombinant proteins in E. coli expression system with high yield remain scarce. This is mainly due to the difficulties in improving the overall production capacity, as most of the well-established strategies usually involve a series of trial and error steps with unguaranteed success. One way to concurrently improve the production yield and minimize the production cost would be incorporating the potency of bioinformatics tools to conduct in silico studies, which forecasts the outcome before actual experimental work. In this article, we review and compare seven prediction tools available, which predict the solubility of protein expressed in E. coli, using the following criteria: prediction performance, usability, utility, prediction tool development and validation methodologies. This comprehensive review will be a valuable resource for researchers with limited prior experience in bioinformatics tools. As such, this will facilitate their choice of appropriate tools for studies related to enhancement of intracellular recombinant protein production in E. coli.
This STS on Emotions for Accessibility is targeted towards the description of affective computing approaches e.g. emotional state detection, emotional state elicitation and their potential usage in the context of user...
详细信息
ISBN:
(纸本)9783319085968;9783319085951
This STS on Emotions for Accessibility is targeted towards the description of affective computing approaches e.g. emotional state detection, emotional state elicitation and their potential usage in the context of users who are disabled or older people. These user groups may gain a lot of benefits from affective computing systems e.g. in support of communication especially none verbal, management of emotional states or stress, or just improving therapy procedures etc.
Objective Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes;however, very limite...
详细信息
Objective Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes;however, very limited NER research has been carried out on clinical notes written in Chinese. The goal of this study was to systematically investigate features and machine learning algorithms for NER in Chinese clinical text. Materials and methods We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital in China. For each note, four types of entity clinical problems, procedures, laboratory test, and medications were annotated according to a predefined guideline. Two-thirds of the 400 notes were used to train the NER systems and one-third for testing. We investigated the effects of different types of feature including bag-of-characters, word segmentation, part-of-speech, and section information, and different machine learning algorithms including conditional random fields (CRF), support vector machines (SVM), maximum entropy (ME), and structural SVM (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset and evaluated on the test set, and micro-averaged precision, recall, and F-measure were reported. Results Our evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited. The system achieved the highest performance by combining word segmentation and section information, indicating that these two types of feature complement each other. When the same types of optimized feature were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM achieved the highest performance of the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries, respectively.
Sentiment classification is to find the polarity of product or user reviews. Supervised machine learning algorithms are used for opinion mining such as Navie Bayes, K-nearest neighbor and Support vector machine. KNN i...
详细信息
ISBN:
(纸本)9781479923977
Sentiment classification is to find the polarity of product or user reviews. Supervised machine learning algorithms are used for opinion mining such as Navie Bayes, K-nearest neighbor and Support vector machine. KNN is simple algorithm but less efficient classification algorithm. In this paper we propose an improved KNN algorithm, genetic algorithm is developed which is a hybrid genetic algorithm that incorporates the information gain for feature selection and combined with KNN to improve its classification performance. Specifically, we compared other supervised machinelearning approaches such as Navie Bayes and traditional kNN for Sentiment Classification of movie reviews and book reviews. The experimental results using genetic algorithm with improved indicate high performance levels with Fmeasure of over 87% on the movie reviews.
Honorifics in this paper refer to names of official positions and titles of nobility or honor. They can be found in various written records in different periods and have great historical significance. This paper intro...
详细信息
ISBN:
(纸本)9783319143316;9783319143309
Honorifics in this paper refer to names of official positions and titles of nobility or honor. They can be found in various written records in different periods and have great historical significance. This paper introduces a machinelearning system to recognize the honorifics in diachronic corpora. A tagged corpus of four classic novels written in the Ming and Qing dynasties is used to train the system. The system is then used to automatically recognize and extract the honorifics in pre-Qin classics, Tang-dynasty poems, and modern Chinese news. Experimental results show that the system can achieve relatively good results in recognizing the honorifics in the pre-Qin classics and Tang-dynasty poems. This work is an attempt to improve the performance of automatic recognition of honorifics in diachronic corpora. The system can be a helpful tool in the studies on the evolution of honorifics throughout Chinese history.
In this paper, we propose multi-objective differential evolution (DE) based feature selection and ensemble learning techniques for biomedical entity extraction. The algorithm operates in two layers, first step of whic...
详细信息
ISBN:
(纸本)9781479930807
In this paper, we propose multi-objective differential evolution (DE) based feature selection and ensemble learning techniques for biomedical entity extraction. The algorithm operates in two layers, first step of which concerns with the problem of automatic feature selection for a machine learning algorithm, namely Conditional Random Field (CRF). The solutions of the final best population provides different feature combinations. The classifiers generated with these feature representations are combined together using a multi-objective differential based ensemble technique. We evaluate the proposed algorithm for named entity (NE) extraction in biomedical text. Experiments on the benchmark setup yield recall, precision and F-measure values of 73.50%, 77.02% and 75.22%, respectively.
Physically Unclonable Functions have emerged as a possible candidate to replace traditional cryptography. However, majority of the strong PUFs are vulnerable to modeling attacks. In this work, we take a closer look at...
详细信息
ISBN:
(纸本)9781479964925
Physically Unclonable Functions have emerged as a possible candidate to replace traditional cryptography. However, majority of the strong PUFs are vulnerable to modeling attacks. In this work, we take a closer look at the possible attacks on one of the strong PUF architectures known as Current-based PUFs, which exploit irregularities in transistor currents to generate unique signatures. We demonstrate that the fault-injection attacks when coupled with a machinelearning (ML) algorithm can considerably push the limits of prediction accuracies. Based on simulations, we observed that the stand-alone ML algorithms suffer from error prone CRPs especially for higher length PUFs. In such scenarios, hybrid attacks exploiting the unreliable responses pushed the prediction accuracies up to 99% for higher length Current-based PUF circuits.
In this paper, we present an experimental approach to design systems sensitive to emotion. We describe a system for the detection of emotional states based on physiological signals and an application use case utilizin...
详细信息
ISBN:
(纸本)9783319085968;9783319085951
In this paper, we present an experimental approach to design systems sensitive to emotion. We describe a system for the detection of emotional states based on physiological signals and an application use case utilizing the detected emotional state. The application is an emotion management system to be used for the support in the improvement of life conditions of users suffering from cerebral palsy (CP). The system presented here combines effectively biofeedback sensors and a set of software algorithms to detect the current emotional state of the user and to react to them appropriately.
暂无评论