We demonstrate a new algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Our approach is based on the free energy minimization criterion, and utilizes a sophisticated energy model that is mo...
详细信息
In this paper, we aim to tackle flexible cost requirements for long-tail datasets, where we need to construct a (1) cost-sensitive and (2) class-distribution robust learning framework. The misclassification cost and t...
In this paper, we aim to tackle flexible cost requirements for long-tail datasets, where we need to construct a (1) cost-sensitive and (2) class-distribution robust learning framework. The misclassification cost and the area under the ROC curve (AUC) are popular metrics for (1) and (2), respectively. However, limited by their formulations, models trained with AUC are not well-suited for cost-sensitive decision problems, and models trained with fixed costs are sensitive to the class distribution shift. To address this issue, we present a new setting where costs are treated like a dataset to deal with arbitrarily unknown cost distributions. Moreover, we propose a novel weighted version of AUC where the cost distribution can be integrated into its calculation through decision thresholds. To formulate this setting, we propose a novel bilevel paradigm to bridge weighted AUC (WAUC) and cost. The inner-level problem approximates the optimal threshold from sampling costs, and the outer-level problem minimizes the WAUC loss over the optimal threshold distribution. To optimize this bilevel paradigm, we employ a stochastic optimization algorithm (SACCL) which enjoys the same convergence rate (O(ε-4)) with the SGD. Finally, experiment results show that our algorithm performs better than existing cost-sensitive learning methods and two-stage AUC decisions approach.
Recommender Systems play an important role in handing large amounts of information and support users by recommending content considered as being particularly interesting for them. In this paper, Contextaware Movie Rec...
详细信息
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence lab.ling there exist multiple corpora with different and incompatible annotation guidelines ...
ISBN:
(纸本)9781932432459
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence lab.ling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experiments show that adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which in turn helps improve Chinese parsing accuracy.
At present,the internet pornographic text is in varied forms and changeful, although it is prohibited ever. It severely harms people's mental and physical health development and social stability. There are IP-base...
详细信息
One fundamental problem in services computing is how to bridge the gap between business requirements and various heterogeneous IT services. This involves eliciting business requirements and building a solution accordi...
详细信息
One fundamental problem in services computing is how to bridge the gap between business requirements and various heterogeneous IT services. This involves eliciting business requirements and building a solution accordingly by reusing availab.e services. While the business requirements are commonly elicited through use cases and scenarios, it is not straightforward to transform the use case model into a service model, and the existing manual approach is cumbersome and error-prone. In this paper, the environment ontology, which is used to model the problem space, is utilized to facilitate the model transformation process. The environment ontology provides a common understanding between business analysts and software engineers. The required software functionalities as well as the availab.e services’ capabilities are described using this ontology. By semi-automatically matching the required capability of each use case to the availab.e capabilities provides by services, a use case is realized by that set of services. At the end of this paper, a fictitious case study was used to illustrate how this approach works.
This paper presents a framework with two automatic tasks targeting large-scale and low quality sports video archives collected from online video streams. The framework is based on the bag of visual-words model using s...
详细信息
ISBN:
(纸本)9781605586083
This paper presents a framework with two automatic tasks targeting large-scale and low quality sports video archives collected from online video streams. The framework is based on the bag of visual-words model using speeded-up robust features (SURF). The first task is sports genre categorization based on hierarchical structure. Following on the second task which is based on automatically obtained genre, views are classified using support vector machines (SVMs). As a consequence, the views classification result can be used in video parsing and highlight extraction. As compared with state-of-the-art methods, our approach is fully automatic as well as domain knowledge free and thus provides a better extensibility. Furthermore, our dataset consists of 14 sport genres with 6850 minutes in total. Both sport genre categorization and view type classification have more than 80% accuracy rates, which validate this framework's robustness and potential in web-based applications. Copyright 2009 ACM.
Semi-G2 basis functions are introduced, the degree of which is larger than three. These basis functions are expressed explicitly via matrices decomposition. Based on them, equations for constructing G2 splines can be ...
详细信息
Semi-G2 basis functions are introduced, the degree of which is larger than three. These basis functions are expressed explicitly via matrices decomposition. Based on them, equations for constructing G2 splines can be presented independently of geometric shape parameters' values. It makes the equation's solving easier. Analysis shows that this method may be extended to be applicable for constructing Gn splines.
Recently, an overwhelming majority of object detection methods have focused on how to reduce the number of region proposals while keeping high object recall without consideration of category information. It may lead t...
详细信息
Support Vector Machine (SVM) is a classification technique of machine learning based on statistical learning theory. A quadratic optimization problem needs to be solved in the algorithm, and with the increase of the s...
详细信息
暂无评论