Visualization of large-scale data is the first step to acquire preliminary insight into complex biological data. In recent years, many statistical visualization methods have been designed to support data visualization...
详细信息
ISBN:
(纸本)9781479956708
Visualization of large-scale data is the first step to acquire preliminary insight into complex biological data. In recent years, many statistical visualization methods have been designed to support data visualization. Stochastic Neighbor Embedding (SNE) is one of these efficient approaches, which uses the probabilistic distance to model differences among data points within the data space. SNE and its variants (e.g. t-SNE) have demonstrated superiority over other methods in exploring complex data. By using these methods, however, similar data points tend to group together, which prevents the identification of subtle differences. A good visualization method should not only present clear data structure, but distinguish subtle differences. In this paper, we propose a novel extension of SNE. The approach has three innovations: (1) we replaced the Gaussian distribution in SNE with a Laplacian distribution on both high dimensional space and low dimensional space. The Laplace distribution has wider tails than the Gaussian distribution, and thus it can be used to overcome the over-crowding problem noted in SNE and its variants. (2) We used a symmetric modification of Kullback-Leibler divergence measure as the objective function which provides more flexibility to the model. (3) We add a graph Laplacian regularization terms to the objective function which have an advantage to preserve the manifold structure among data points. Experiments on simulation data and human microbiome data indicate that it has better visualization performance than other methods in distinguishing crowding data points.
Classification is one of the most significant methods in predictive analysis for categorical labeled ***,an accurate classification model is difficult to train for some real cases due to imbalanced samples,large fluct...
详细信息
Classification is one of the most significant methods in predictive analysis for categorical labeled ***,an accurate classification model is difficult to train for some real cases due to imbalanced samples,large fluctuating records,and overlapping class *** solving the above problems,in this work,we introduce a Two-Stage with Enhanced Samples(TSES)prediction framework that can balance the samples using Two-Stage classification method and increase the number of sample to make it enough for obtaining an accurate *** proposed TSES achieves outstanding classification performance on a real case of rainfall *** proving the effectiveness of TSES,we compare it with some traditional classification *** results show that it can be a promising method for the prediction problems with imbalanced data with overlapping labels.
The digitization and integration of biodiversity data are essential for supporting environmental conservation and sustainable use of natural resources. Nowadays an increasing amount of data are made available by regio...
详细信息
The digitization and integration of biodiversity data are essential for supporting environmental conservation and sustainable use of natural resources. Nowadays an increasing amount of data are made available by regional, national and global initiatives, but the efficient use of data still a challenge. New techniques are needed to enable efficient manage and the use of these various types of biotic and abiotic data to generate useful knowledge for decision-making processes. We present a work in progress research that proposes a computational framework to manage biodiversity data and to enable an efficient information retrieval process.
Multithreaded operating system(OS) is essential tobe used for many wireless sensor network(WSN) applications. However, the RAM consumption of the multithreaded OS is high. Thus, itis infeasible toapply the multithread...
详细信息
Multithreaded operating system(OS) is essential tobe used for many wireless sensor network(WSN) applications. However, the RAM consumption of the multithreaded OS is high. Thus, itis infeasible toapply the multithreaded OS on many high RAM constrained wireless sensor nodes(WNs). Toaddress this challenge, several memory optimization strategies are investigated in this paper. On one hand, the subfunctiongranularity thread switch and hybrid OS scheduling model are proposed todecrease the RAM consumption of the thread stacks. On the other hand, different dynamic memory allocation mechanisms are presented toreduce the heap memory size. With the implementation of these optimization techniques, the total RAM consumption of the multithreaded WSN OSs can be brought down greatly. In result, the multithreaded WSN OSs become suitable tobe used even on the high RAM constrained WNs.
In blended learning, learners' interest is closely related to whether they can successfully finish their studies. In order to study on learners' interest, we adopt situated cognition theory as the theoretical ...
详细信息
In blended learning, learners' interest is closely related to whether they can successfully finish their studies. In order to study on learners' interest, we adopt situated cognition theory as the theoretical basis, and carried out data mining for learners' online behaviors, as well as the analysis of brain cognitive experiment data. Then brain mechanism for context learning interest and online operation behaviors were proposed in our study. In addition, the learners' interest model based on neural network was established. Experimental results show that this method can effectively represents interest of learners in blended learning.
An Information retrieval system is a software system that provides access to books, journals, and other documents. The IR system provides this access according to a specific user query to retrieve the intended book, d...
详细信息
ISBN:
(数字)9798350363203
ISBN:
(纸本)9798350363210
An Information retrieval system is a software system that provides access to books, journals, and other documents. The IR system provides this access according to a specific user query to retrieve the intended book, document… etc. The better IR system is the system that produces better results in a standard testing medium that contains collections of documents, queries, and relevant judgment lists. Relevant judgment lists are used to match each document to its relevant query, and the result of the IR system is compared to this judgment list to evaluate its performance. Usually, judgment lists are built by specialists in the domains of the document subject. In this paper, we used a Machine learning algorithm to create a judgment list with the least human involvement, saving much time and effort. The algorithm that is used is K-nearest neighbor, and the method that was used is the inverse of the distance. The requirements for this method are a few human-matched documents forming a small relevant judgment list, which saves effort and time, as we said before. The experiments were done on a test collection: Medline, which was composed of 1033 documents with 30 queries. This experiment serves as a preliminary study conducted on a small test collection, providing a foundation for further refinement and validation on larger collections in future work.
The Embarrassingly Parallel(EP) algorithm which is typical of many Monte Carloapplications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Int...
详细信息
The Embarrassingly Parallel(EP) algorithm which is typical of many Monte Carloapplications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Intel released Many Integrated Core(MIC) architecture as a many-core co-processor. MIC often offers more than 50 cores each of which can run four hardware threads as well as 512-bit vector instructions. In this paper,we describe how the EP algorithm is accelerated effectively on the platforms containing MIC using the offload execution model. The result shows that the efficientimplementation of EP algorithm on MIC can take full advantage of MIC's computational resources and achieves a speedup of 3.06 compared with that on Intel Xeon E5-2670 CPU. Based on the EP algorithm on MIC and an effective task distribution model, the implementation of EP algorithm on a CPU-MIC heterogeneous platform achieves the performance of up to2134.86 Mop/s and 4.04 times speedup compared with that on Intel Xeon E5-2670 CPU.
In the real world,uncertain data exist exoterically, as in many area of application,it is impossible to express data with uncertainty of one hundred *** uncertainty is inherent in these systems due to measurement and ...
详细信息
In the real world,uncertain data exist exoterically, as in many area of application,it is impossible to express data with uncertainty of one hundred *** uncertainty is inherent in these systems due to measurement and sampling errors,and resource *** flexibility of XML data model allows a more natural representation of uncertain data compared with the relational *** matching of a twig pattern against probabilistic XML data is an essential problem in the query processing of probabilistic *** typical algorithms of twig join in XML of certainty can not be used or be adjusted to process against probabilistic XML,because of the new characteristics of uncertain XML data,such as the distribution nodes and probabilistic *** this paper,we propose an algorithm for twig joins against probabilistic XML which is based on a new prefix encoding *** have been conducted to study the performance of the proposed algorithm.
Run-time malware detection strategies are efficient and robust, which get more and more attention. In this paper, we use I/O Request Package(IRP) sequences for malware detection. N-gram will be used to analyze IRP s...
详细信息
Run-time malware detection strategies are efficient and robust, which get more and more attention. In this paper, we use I/O Request Package(IRP) sequences for malware detection. N-gram will be used to analyze IRP sequences for feature extraction. Integrated Negative Selection Algorithm(NSA) and Positive Selection Algorithm(PSA), through a selection of ngram sequences which only exist in malware IRP sequences, we have more than 96% true positive rate and 0% false positive rate.
Traditional symbolic execution for testing software focuses on exploring the paths of the program. However, for stateful network protocol, this method is hard to explore all the protocol states. This paper proposes a ...
详细信息
Traditional symbolic execution for testing software focuses on exploring the paths of the program. However, for stateful network protocol, this method is hard to explore all the protocol states. This paper proposes a novel method based on model-guided symbolic execution, which can associate the program paths with the protocol states and utilize the protocol model to guide the test to explore interesting deep states of the protocol. This paper also presents a prototype system, S2EProtocol-MG, upon Selective Symbolic Execution(S2E) platform to test for network protocol binary software. To demonstrate S2EProtocol-MG's effectiveness, this paper employs it to detect vulnerabilities on several real-world network protocol software.
暂无评论