Estimating taxonomic content constitutes a key problem in metagenomic sequencing data ***,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently avail...
详细信息
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data ***,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available ***,we present CloudLCA,a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data *** show that CloudLCA(1)has a running time nearly linear with the increase of dataset magnitude,(2)displays linear speedup as the number of processors grows,especially for large datasets,and(3)reaches a speed of nearly 215 million reads each minute on a cluster with ten thin *** comparison with MEGAN,a well-known metagenome analyzer,the speed of CloudLCA is up to 5 more times faster,and its peak memory usage is approximately 18.5%that of MEGAN,running on a fat *** can be run on one multiprocessor node or a *** is expected to be part of MEGAN to accelerate analyzing reads,with the same output generated as MEGAN,which can be import into MEGAN in a direct way to finish the following ***,CloudLCA is a universal solution for finding the lowest common ancestor,and it can be applied in other fields requiring an LCA algorithm.
In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpu...
详细信息
ISBN:
(纸本)9781622765034
In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpus from one annotation guideline to another. We then propose two optimization strategies, iterative training and predict-self reestimation, to further improve the accuracy of annotation guideline transformation. Experiments on Chinese word segmentation show that, the iterative training strategy together with predict-self reestimation brings significant improvement over the simple annotation transformation baseline, and leads to classifiers with significantly higher accuracy and several times faster processing than annotation adaptation does. On the Penn Chinese Treebank 5.0, it achieves an F-measure of 98.43%, significantly outperforms previous works although using a single classifier with only local features.
To explore the association relations among disease, pathogenesis, physician, symptoms and drug, we adapt a variational Apriori algorithm for discovering association rules on a dataset of the Qing Court Medical Records...
详细信息
MicroRNAs(miRNAs)are a class of small non-coding RNAs that play important roles in post-transcriptional regulation of gene expression[1].A large number of miRNAs have been found to be involved in a broad spectrum of b...
详细信息
MicroRNAs(miRNAs)are a class of small non-coding RNAs that play important roles in post-transcriptional regulation of gene expression[1].A large number of miRNAs have been found to be involved in a broad spectrum of biological functions such as regulation of innate and adaptive immunity,cell differentiation and development as well as
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. T...
详细信息
Most of the previous works for web video topic detection(e.g., graph-based co-clustering method) always encounter the problem of real-time topic detection, since they all suffer from the high computation complexity. Therefore, a fast topic detection is needed to meet users' or administrators' requirement in real-world scenarios. Along this line, we propose a fast and effective topic detection framework, in which video streams are first partitioned into buckets using a time-window function, and then an incremental hierarchical clustering algorithm is developed, finally a video-based fusion strategy is used to integrate information from multiple modalities. Furthermore, a series of novel similarity metrics are defined in the framework. The experimental results on three months' YouTube videos demonstrate the effectiveness and efficiency of the proposed method.
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular Computing (GrC). Zadeh proposed that ther...
详细信息
Local learning approaches are especially easy for parallel processing, so they are very important for cloud computing. In 1997, Lotti A. Zadeh proposed the concept of Granular Computing (GrC). Zadeh proposed that there are three basic concepts that underlie human cognition: granulation, organization and causation and a granule being a clump of points (objects) drawn together by indistinguishability, similarity, proximity or functionality. In this paper, we give out a novel local learning approach based on the concept of Granular computing named as "nested local learning NGLL". The experiment shows that the novel NGLL approach is better than the probabilistic latent semantic analysis (PLSA).
When applying Switched Ethernet in real-time communications, the switch and the end-nodes schedule the real-time messages using Earliest Deadline First (EDF) algorithm. The problem we are facing is how to divide deadl...
详细信息
Aiming at the problem that requiring large amounts of labeled training data while using supervised learning to extract the expert metadata, a semi-supervised expert metadata extraction method based on co-training styl...
详细信息
Aiming at the problem that requiring large amounts of labeled training data while using supervised learning to extract the expert metadata, a semi-supervised expert metadata extraction method based on co-training style is proposed. Firstly, according to the characteristics of expert metadata, we select expert metadata features and label a certain amount of metadata samples, then train two classifiers with maximum entropy and conditional random respectively. Secondly, two classifiers are used to label metadata items in the unlabeled expert home pages; when the classification results of one type metadata in one expert page satisfy the confidence requirement, analyze the differences of each type metadata labeled by two classifiers; for the metadata satisfying the difference requirement, the better performing classifier for one type metadata is selected to label the certain type metadata, then the labeled expert homepage is obtained as the labeled sample. Finally, use the above-mentioned labeled expert homepage to extend training samples, and retrain two new classifiers, then iterate until two classifiers are convergent. In the experiment, we collected 2000 expert home pages; the results indicate that the semi-supervised expert metadata extraction method based on co-training style outperforms a number of supervised methods, which reduces the amount of manual labeling work effectively.
Multi-task learning has proven to be useful to boost the learning of multiple related but different tasks. Meanwhile, latent semantic models such as LSA and LDA are popular and effective methods to extract discriminat...
详细信息
Multi-task learning has proven to be useful to boost the learning of multiple related but different tasks. Meanwhile, latent semantic models such as LSA and LDA are popular and effective methods to extract discriminative semantic features of high dimensional dyadic data. In this paper, we present a method to combine these two techniques together by introducing a new matrix tri-factorization based formulation for semi-supervised latent semantic learning, which can incorporate labeled information into traditional unsupervised learning of latent semantics. Our inspiration for multi-task semantic feature learning comes from two facts, i.e., 1) multiple tasks generally share a set of common latent semantics, and 2) a semantic usually has a stable indication of categories no matter which task it is from. Thus to make multiple tasks learn from each other we wish to share the associations between categories and those common semantics among tasks. Along this line, we propose a novel joint Nonnegative matrix tri-factorization framework with the aforesaid associations shared among tasks in the form of a semantic-category relation matrix. Our new formulation for multi-task learning can simultaneously learn (1) discriminative semantic features of each task, (2) predictive structure and categories of unlabeled data in each task, (3) common semantics shared among tasks and specific semantics exclusive to each task. We give alternating iterative algorithm to optimize our objective and theoretically show its convergence. Finally extensive experiments on text data along with the comparison with various baselines and three state-of-the-art multi-task learning algorithms demonstrate the effectiveness of our method.
With the success of internet, recently more and more companies start to run web-based business. While running e-business sites, many companies have encountered unexpected degeneration of their web server applications ...
详细信息
暂无评论