The advance in molecular dynamics (MD) techniques has made this method common in studies involving the discovery of physicochemical and conformational properties of proteins. However, the analysis may be difficult sin...
详细信息
ISBN:
(纸本)9783030179359;9783030179342
The advance in molecular dynamics (MD) techniques has made this method common in studies involving the discovery of physicochemical and conformational properties of proteins. However, the analysis may be difficult since MD generates a lot of conformations with high dimensionality. Among the methods used to explore this problem, machine learning has been used to find a lower dimensional manifold called "intrinsic dimensionality space" which is embedded in a high dimensional space and represents the essential motions of proteins. To identify this manifold, Euclidean distance between intra-molecular Ca atoms for each conformation was used. The approaches used were combining data dimensionality reduction (AutoEncoder, Isomap, t-SNE, MDS, Spectral and PCA methods) and Ward algorithm to group similar conformations and find the representative structures. Findings pointed out that Spectral and Isomap methods were able to generate low-dimensionality spaces providing good insights about the classes separation of conformations. As they are nonlinear methods, the low-dimensionality generated represents better the protein motions than PCA embedding, so they could be considered alternatives to full MD analyses.
With the rapid growth of cell phone networks during the last decades, call detail records (CDR) have been used as approximate indicators for large scale studies on human and urban mobility. Although coarse and limited...
详细信息
With the rapid growth of cell phone networks during the last decades, call detail records (CDR) have been used as approximate indicators for large scale studies on human and urban mobility. Although coarse and limited, CDR are a real marker of human presence. In this paper, we use more than 800 million CDR to identify weekly patterns of human mobility through mobile phone data. Our methodology is based on the classification of individuals into six distinct presence profiles where we focus on the inherent temporal and geographical characteristics of each profile within a territory. Then, we use an event-based algorithm to cluster individuals and we identify 12 weekly patterns. We leverage these results to analyze population estimates adjustment processes and as a result, we propose new indicators to characterize the dynamics of a territory. Our model has been applied to real data coming from more than 1.6 million individuals and demonstrates its relevance. The product of our work can be used by local authorities for human mobility analysis and urban planning.
To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU cl...
详细信息
To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.
clustering of data points has been a profound research avenue in the history of machine learning algorithms. Using learning automata which are autonomous decision making entities, in this paper, the learning automata ...
详细信息
clustering of data points has been a profound research avenue in the history of machine learning algorithms. Using learning automata which are autonomous decision making entities, in this paper, the learning automata clustering algorithm is proposed. In learning automata clustering, each data point is affiliated with a learning automaton where the learning automaton determines the cluster membership of that data point. The cluster rectification is done through a reinforcement signal for each learning automaton which is fabricated from the Euclidean distance of that data point and the mean value of its designated cluster. Finally, the learning automata clustering is compared with four centroid-based clustering algorithms, K-means, K-means++, K-medians, and K-medoids and results demonstrate the high clustering accuracy and comparable Silhouette coefficient of the proposed method. (C) 2017 Elsevier B.V. All rights reserved.
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming mode...
详细信息
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming model is used to make full use of the computing and storage capacity of Hadoop cluster. Large quantities of buyers on taobao are taken as application context to do case study through Hadoop platform's data mining set Mahout. General procedure for miming with Mahout is also given. clustering algorithm based on MapReduce shows preferable clustering quality and operation speed. Comparison is made between Canopy + K-means algorithm and K-means algorithm in respect of runtime, speed-up ratio and extendibility. Test is conducted for these two clustering algorithms on clusters with different numbers of nodes in context of dataset of various scales. The experimental results show that Canopy + K-means algorithm has faster operation speed than K-means algorithm, but both of them show good speed-up ratio under Hadoop environment and Canopy + K-means algorithm is even much better K-means algorithm.
Port container handling is an important part of terminal operations. When loading multiple containers, operators can only rely on human eyes and PLC information to obtain the location of the container. To avoid the se...
详细信息
ISBN:
(纸本)9781728139364
Port container handling is an important part of terminal operations. When loading multiple containers, operators can only rely on human eyes and PLC information to obtain the location of the container. To avoid the serious consequences of miscalculation, a combination algorithm is proposed in this paper. We choose coastline, container ridgeline and container number as features to analyses and locate. coastline location algorithm based on color space and improved OTSU segmentation. Container ridgeline location based on Sobel and clustering algorithm. Container number location based on Maximally Stable Extremal Regions (MSER) algorithm. After testing 71 group operation videos of 4 bridge cranes in Ningbo port, the combination algorithm achieves better positioning accuracy. The average processing time of a single frame is less than 0.6s. The algorithm proposed in this paper provides an effective reference and solution for improving the efficiency of terminal operations.
Multi-document summarization is more challenge than single-document summarization since it has to solve the problem of overlapping information among sentences from different documents. Also, since multi-document summa...
详细信息
ISBN:
(纸本)9781450372459
Multi-document summarization is more challenge than single-document summarization since it has to solve the problem of overlapping information among sentences from different documents. Also, since multi-document summarization dataset is rare, methods based on deep learning are difficult to be applied. In this paper, we propose an approach to multi-document summarization based on k-means clustering algorithm, combining with centroid-based method, maximal marginal relevance and sentence positions. This approach is efficient in finding salient sentences and preventing overlapping between sentences. Experiments using DUC 2007 dataset show that our system is more efficient than other researches in this field.
Modern medical science strongly depends on imaging technologies for accurate diagnose and treatment planning. Raw medical images generally require post-processing - like edge and contrast enhancement, and noise remova...
详细信息
ISBN:
(纸本)9781450363143
Modern medical science strongly depends on imaging technologies for accurate diagnose and treatment planning. Raw medical images generally require post-processing - like edge and contrast enhancement, and noise removal - for visualization. In this paper, a clustering-based contrast enhancement technique is presented for computed tomography (CT) images.
Real-time monitoring of surface water quality is an intractable problem. A Soft-sensor method based on fuzzy neural network (FNN) is proposed to solve this problem in this paper. Firstly, the river data was analyzed b...
详细信息
ISBN:
(纸本)9789881563972
Real-time monitoring of surface water quality is an intractable problem. A Soft-sensor method based on fuzzy neural network (FNN) is proposed to solve this problem in this paper. Firstly, the river data was analyzed by principal component analysis (PCA) to obtain related variables such as dissolved oxygen (DO) and ammonia nitrogen (NH3-N). Secondly, a multi-input soft-sensor method based on FNN is designed. The training data is preprocessed by Hierarchical clustering and K-means algorithm (H-K algorithm), which improves the accuracy of the soft-sensor method. Finally, the soft-sensor method is packaged and applied to Beijing Tonghui River. The results indicate that the FNN based soft-sensor can predict surface water quality simultaneously with suitable prediction accuracy.
For the development of the Underwater Internet of Things, reliable transmission of underwater wireless sensor networks to monitor the marine environment is important. However, for ocean monitoring, the reliability of ...
详细信息
ISBN:
(纸本)9781728103501
For the development of the Underwater Internet of Things, reliable transmission of underwater wireless sensor networks to monitor the marine environment is important. However, for ocean monitoring, the reliability of data transmission is difficult to guarantee because of node mobility. In addition, energy consumption must be reduced during data transmission because node energy is limited. To entirely address these problems, this paper proposes a self-organising routing algorithm based on a joint clustering and routing strategy for ocean monitoring (JCR-OM) to increase reliable data transmission in underwater wireless sensor networks. Firstly, the reliable communication distance of the node is calculated in a multilayer current model by using a force analysis of the anchor node. Then, in cluster head selection, the reliable transmission distance and a backoff strategy are introduced to improve the impact of node mobility on data transmission. In intercluster routing selection, a greedy strategy is used to construct a routing strategy with minimum communication cost. The simulation results verify that JCR-OM can improve data transmission and prolong network lifetime.
暂无评论