BACKGROUND:The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an impor...
详细信息
BACKGROUND:The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.
RESULTS:We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome.
CONCLUSIONS:Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.
A hyperspectral imaging sensor system was developed for the detection of bruises on pears, for these bruises were difficult to be detected by traditional computer vision technique. Hyperspectral imaging sensor techniq...
详细信息
A hyperspectral imaging sensor system was developed for the detection of bruises on pears, for these bruises were difficult to be detected by traditional computer vision technique. Hyperspectral imaging sensor technique is susceptible to the effects of uneven illumination due to a spherical object of pear. The data of hyperspectral image is a 3-dimension cube, which contains a huge amount of information. So it requires a suitable algorithm to extract some useful information from the 3-dimension data cube. In this work, Principal Component Analysis (PCA) was firstly used to extract some useful information, then several other classification algorithms were used comparatively to process the 3-dimension data cube. These classification algorithms were Maximum Likelihood classification (MLC), Euclidean Distance classification (EDC), Mahalanobis Distance classification (MDC) and Spectral Angle Mapper (SAM), respectively. Results show that MDC and SAM have well performance, with detection accuracy of 93.8% and 95.0% respectively. Compared with the other classification algorithms, MDC and SAM can overcome the effects of uneven illumination in detecting bruise of pear by hyperspectral imaging sensor technique. This work demonstrates that it is feasible to detect the bruised region on the surface of pear by hyperspectral imaging sensor technique combined with MDC and SAM.
Decision tree is an important learning method in machine learning and data mining,this paper discusses the method of choosing the best attribute based on information *** analyzes the process and the characters of clas...
详细信息
Decision tree is an important learning method in machine learning and data mining,this paper discusses the method of choosing the best attribute based on information *** analyzes the process and the characters of classification and the discovery knowledge based on decision tree about the application of decision tree on data *** an instance, the paper shows the procedure of selecting the decision attribute in detail,finally it pointes out the developing trends of decision tree.
Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a data...
详细信息
ISBN:
(纸本)9781921770029
Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a data set, and to predict the class attribute value of an unlabeled record. Sometimes a set of decision trees, rather than just a single tree, is also generated from a data set. A set of multiple trees, when used wisely, typically have better prediction accuracy on unlabeled records. Existing multiple tree techniques are catered for high dimensional data sets and therefore unable to build many trees from low dimensional data sets. In this paper we present a novel technique called Sys-For that can build many trees even from a low dimensional data set. Another strength of the technique is that instead of building multiple trees using any attribute (good or bad) it uses only those attributes that have high classification capabilities. We also present two novel voting techniques in order to predict the class value of an unlabeled record through the collective use of multiple trees. Experimental results demonstrate that SysFor is suitable for multiple pattern extraction and knowledge discovery from both low dimensional and high dimensional data sets by building a number of good quality decision trees. Moreover, it also has prediction accuracy higher than the accuracy of several existing techniques that have previously been shown as having high performance.
In microarray data analysis, dimension reduction is an important consideration in the construction of a successful classification algorithm. As an alternative to feature selection, we use a well-known matrix factorisa...
详细信息
ISBN:
(纸本)9781424483075
In microarray data analysis, dimension reduction is an important consideration in the construction of a successful classification algorithm. As an alternative to feature selection, we use a well-known matrix factorisation method. For example, we can employ the popular singular-value decomposition (SVD) or nonnegative matrix factorization. In this paper, we consider a novel algorithm for gradient-based matrix factorisation (GMF). We compare GMF and SVD in their application to five gene expression datasets. The experimental results show that our method is faster, more stable, and sensitive.
We have developed a small, relatively lightweight and efficient short range (<100 m) LIDAR instrument for remotely detecting harmful bioagents. The system is based on a pulsed, eye-safe, 355 nm laser exciting aeros...
详细信息
ISBN:
(纸本)9780819481290
We have developed a small, relatively lightweight and efficient short range (<100 m) LIDAR instrument for remotely detecting harmful bioagents. The system is based on a pulsed, eye-safe, 355 nm laser exciting aerosols which then fluoresce with a typical spectrum. The system makes use of a novel technology for continuously monitoring for the presence of unusual concentrations of bioaerosols at a precise remote location within the monitored area, with response within seconds. Fluorescence is spectrally resolved over 32 channels capable of photon counting. Results show a sensitivity level of 40 ACPLA of Bacillus Globigii, an anthrax simulant, at a distance of 100 m (assumed worst case where 1 ppl = 1 ACPLA) considering particle sizes between 0.5 and 10 mu m, with a geometric mean at 1 um. The apparatus has been tested in the field during three test and evaluation campaigns with multiple bioagents and public security products. Preliminary results show that the system is able to distinguish between harmful bioagents and naturally occurring ones. A classification algorithm was successfully tested with a single type of bioagent;experiments for daytime measurements are discussed.
Packet classification involves matching information from a packet's header to a set of rules in a database in order to determine the manner in which the packet should be processed by network processors. The PCIU a...
详细信息
ISBN:
(纸本)9781612841519
Packet classification involves matching information from a packet's header to a set of rules in a database in order to determine the manner in which the packet should be processed by network processors. The PCIU algorithm [1] is a novel classification algorithm which improves upon previously published techniques in the literature. The main features of the PCIU algorithm are the low pre-processing time and capability of incremental rule update. Using the network processor to implement packet classification would cause saturation even when the best performing packet classification algorithm is used. In this work, we propose a hardware implementation of the PCIU algorithm. Results obtained indicate that the hardware/software co-design approach achieves 4.3x speedup in terms of preprocessing over a pure software implementation and 5.3x speedup for classification.
Detection of marine mammals within an influence zone of episodal anthropogenic noise source is critical to insure the safety of the animals. Marine mammal clicks are closely modeled by AM/FM signals. The Teager-Kaiser...
详细信息
ISBN:
(纸本)9781424443338
Detection of marine mammals within an influence zone of episodal anthropogenic noise source is critical to insure the safety of the animals. Marine mammal clicks are closely modeled by AM/FM signals. The Teager-Kaiser energy operator followed by a threshold detector provides an effective means of detecting AM/FM signals. classification of the species generating the click is done by finding the maximum cross-covariance between the power spectral density (PSD) of the received click and the PSDs of clicks which have been identified. The classification algorithm will create a new library entry when the cross-covariance is below a predefined threshold. Five species of marine mammals were classified using data from the 3rd International Workshop on Detection, classification and Localization of Marine Mammals (DCL). This paper presented a computationally-inexpensive marine mammal detection and classification algorithm with high probabilities of detection and correct classification.
An experimental bifurcation diagram of a circuit implementing an approximation of the Hindmarsh-Rose (HR) neuron model is presented. Measured asymptotic time series of circuit voltages are automatically classified thr...
详细信息
An experimental bifurcation diagram of a circuit implementing an approximation of the Hindmarsh-Rose (HR) neuron model is presented. Measured asymptotic time series of circuit voltages are automatically classified through an ad hoc algorithm. The resulting two-dimensional experimental bifurcation diagram evidences a good match with respect to the numerical results available for both the approximated and original HR model. Moreover, the experimentally obtained current-frequency curve is very similar to that of the original model. The obtained results are both a proof of concept of a quite general method developed in the last few years for the approximation and implementation of nonlinear dynamical systems and a first step towards the realisation in silica of HR neuron networks with tunable parameters. (C) 2010 Elsevier B.V. All rights reserved.
暂无评论