Web users use search engine to find useful information on the Internet. However current web search engines return answer to a query independent of specific user information need. Since web users with similar web behav...
详细信息
Antimicrobial peptides are small peptides encoded by genes. The research area of antimicrobial peptides has attracted intense attention in recent years because "their potential use in the cure of infectious disea...
详细信息
ISBN:
(纸本)9781604235531
Antimicrobial peptides are small peptides encoded by genes. The research area of antimicrobial peptides has attracted intense attention in recent years because "their potential use in the cure of infectious diseases caused by pathogens that have become counteractive to traditional antibiotics" (Boman 1994). There exist huge amount of antimicrobial peptides research articles and this number is continuously increasing. Although some biomedical databases, such as PubMed, have been well established, they provide only query-based information retrieval and end-users need to manually find out relevant information from thousands of retrieved articles. The objective of this paper is to apply one of the text mining techniques, document clustering, which groups similar documents into clusters, to text documents collected from PubMed using keyword "antimicrobial peptides". The results of our work can help researchers to discover meaningful groups of antimicrobial peptides articles in an efficient manner.
Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult t...
详细信息
ISBN:
(纸本)9781604235531
Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult to apply MCLP, or similar optimization methods, to huge datasets. As the size of today's databases is continuously increasing, it is highly important that data mining algorithms are able to perform their functions regardless of dataset sizes. The objectives of this paper are: (1) to propose a new stratified random sampling and majority-vote ensemble approach, and (2) to compare this approach with the plain MCLP approach (which uses only part of the training set), and See5 (which is a decision-tree-based classification tool designed to analyze substantial datasets), on KDD99 and KDD2004 datasets. The results indicate that this new approach not only has the potential to handle arbitrary-size of datasets, but also outperforms the plain MCLP approach and achieves comparable classification accuracy to See5.
Projective clustering is a clustering technique for high dimensional data with the inherent sparsity of the data points. To overcome the unreliable measure of similarity among data points in high dimensions, all data ...
详细信息
Projective clustering is a clustering technique for high dimensional data with the inherent sparsity of the data points. To overcome the unreliable measure of similarity among data points in high dimensions, all data points are projected to a lower dimensional sub-space. Principal component analysis (PCA) is an efficient method to dimensionality reduction by projecting all points to a lower dimensional subspace so that the information loss is minimized. However, PCA does not handle well the situation that different clusters are formed in different subspaces. We propose a method of multiple principal component analysis for iteratively computing projective clusters. The objective function is designed to determine the subspace associated with each cluster. Some experiments have been carried out to show the effectiveness of the proposed method.
The paper presents a general architecture for a P2P data sharing facility within a multi-agent framework, where peers as autonomous high-level nodal agents cooperate with each other to solve global tasks. A node may h...
详细信息
The paper presents a general architecture for a P2P data sharing facility within a multi-agent framework, where peers as autonomous high-level nodal agents cooperate with each other to solve global tasks. A node may have several lower level local agents including local databases and partial global ontologies. In addition there are also minder agents coordinating the activities of the peers that offer the same type of service, thus providing fault-tolerance. The ability of this architecture in data and task sharing has been demonstrated by considering query processing and directory update strategies.
Clustering is a task of grouping data based on similarity. A popular k-means algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeat...
详细信息
Clustering is a task of grouping data based on similarity. A popular k-means algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called weighted k-means to improve the clustering scalability. To speed up the clustering process, we develop the reservoir-biased sampling as an efficient data reduction technique since it performs a single scan over a data set. Our algorithm has been designed to group data of mixture models. We present an experimental evaluation of the proposed method.
This paper describes the derivation and design of an array of self-organizing networks trained by inductive learning for one step ahead prediction of the outputs of the pre-precipitation stage of a wastewater treatmen...
详细信息
This paper describes the derivation and design of an array of self-organizing networks trained by inductive learning for one step ahead prediction of the outputs of the pre-precipitation stage of a wastewater treatment plant with a view to model predictive control of the stage
The trend towards outsourcing increases the number of documents stored at external service providers. This storage model, however, raises privacy and security concerns because the service providers cannot be trusted w...
详细信息
The trend towards outsourcing increases the number of documents stored at external service providers. This storage model, however, raises privacy and security concerns because the service providers cannot be trusted with respect to maintaining the privacy of the documents. The research project SemCrypt^1 explores techniques for processing queries and updates over encrypted XML documents stored at untrusted servers. By performing encryption and decryption only on the client and not on the server, SemCrypt guarantees that neither the document structure nor the document content are disclosed on the server. Filtering query results and processing as much as possible of the query/update statement on the server does not depend on special encryption techniques. Instead, the chosen approach exploits the structural semantics of XML documents and uses standard, well-proven encryption techniques. SemCrypt thus enables to query and update encrypted XML documents on untrusted servers while ensuring data privacy.
Core to ubiquitous computing environments are adaptive software systems that adapt their behavior to the context in which the user is attempting the task the system aims to support. This context is strongly linked wit...
详细信息
Core to ubiquitous computing environments are adaptive software systems that adapt their behavior to the context in which the user is attempting the task the system aims to support. This context is strongly linked with the physical environment in which the task is being performed. The efficacy of such adaptive systems is thus highly dependent on the human perception of the provided system behavior within the context represented by that particular physical environment and social situation. However, effective evaluation of human interaction with adaptive ubiquitous computing technologies has been hindered by the cost and logistics of accurately controlling such environmental context. This paper describes TATUS, a ubiquitous computing simulator aimed at overcoming these cost and logistical issues. Based on a 3D games engine, the simulator has been designed to maximize usability and flexibility in the experimentation of adaptive ubiquitous computing systems. We also describe how this simulator is interfaced with a testbed for wireless communication domain simulation.
The evaluation of learner and tutor feedback is essential in the production of high quality personalized eLearning services. There are few evaluations available in the Adaptive Hypermedia domain relative to the amount...
详细信息
暂无评论