Computational fluid dynamics (CFD) modelling is a scientific tool to provide fluid dynamics and chemical simulation that facilitates understanding of the complex combustion phenomenon in engine studies. With the advan...
详细信息
Computational fluid dynamics (CFD) modelling is a scientific tool to provide fluid dynamics and chemical simulation that facilitates understanding of the complex combustion phenomenon in engine studies. With the advance of Machine Learning (ML) technology, the big data from CFD results can be intelligently recognized and classified, thus ease the data post-processing. This study proposed an integrated analysis that uses CFD simulation results of scalar distributions and k-means clustering algorithm to optimally partition engine combustion chamber into different zones. Therefore, the space of combustion chamber was automatically divided into light soot zones and heavy soot zones based on the clustering results on local equivalence ratio (ER) and temperature. Consequently, the surveys of soot mitigation by Reactivity Controlled Compression Ignition (RCCI) engines combustion mode were carried out as well as corresponding sooting tendency by CFD numerical study. The localized soot depositions in each zone under varied combustion boundaries were compared, hence improving the development of control strategy with numerical modellings and machine learning techniques.
Firstly, this paper introduces the types of clusteringalgorithm, and introduces the classical k-meansalgorithm and canopy algorithm in detail. Then, combining the map reduce computing model and spark cloud computing...
详细信息
Firstly, this paper introduces the types of clusteringalgorithm, and introduces the classical k-meansalgorithm and canopy algorithm in detail. Then, combining the map reduce computing model and spark cloud computing framework, this paper introduces the parallel Canopy-k-meansalgorithm after using Canopy algorithm to optimize the initial value of k-meansalgorithm. However, because Canopy algorithm needs to introduce a new distance threshold parameter T2, and the parameter needs to be set by human experience, it is difficult to determine the parameter artificially for large data, so this paper proposes a parallel adaptive Canopy-k-meansalgorithm, which can be used in cloud computing framework to determine the distance threshold parameter T2 adaptively based on statistical method. Using the parallelism of Map-Reduce computing model, the parallel Canopy-k-meansalgorithm is optimized by adaptive parameter estimation, which solves the problem that parameters depend on manual experience selection in Canopy process. After introducing the relevant theories and derivation process of this algorithm, cloud computing experiment platform is built based on the Spark framework, and the contrast experiments were performed using the Stanford Large Network Dataset Collection (SNAP) dataset and self-built Dimension Networks dataset. The experimental results show that the proposed method is effective.
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged...
详细信息
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and k-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer.
The wellbore flow analysis of optical fiber vibration signal depends on distributed optical fiber logging. Distributed optical fiber logging technology identifies the fluid in the well through distributed optical fibe...
详细信息
The wellbore flow analysis of optical fiber vibration signal depends on distributed optical fiber logging. Distributed optical fiber logging technology identifies the fluid in the well through distributed optical fiber acoustic sensor (DAS) and distributed optical fiber temperature sensor (DTS). Distributed optical fiber sensor has the advantages of small underground interference, high efficiency and low cost. In this paper, the wellhead data extracted by the distributed optical fiber acoustic sensor is used to calculate the upper bound of the fluid sound frequency band in the pipe by nonlinear least squares fitting. The k-means clustering algorithm is used to cluster the optical fiber vibration signals in the low frequency band. According to the clustering results, the ratio of the optical fiber signal eigenvalues of each production layers is obtained, and the trend of the ratio of the optical fiber signal eigenvalues of each production layers is judged to be close to the trend of the water absorption intensity. Compared with traditional acoustic logging, the wellbore flow analysis using distributed optical fiber acoustic sensor can quickly determine the production contribution of each layer and the change of fluid phase state in the production cycle. Combined with traditional production logging technology, distributed optical fiber logging shows its reliability and accuracy in data collection, logging interpretation and production application. Starting from the principle of distributed optical fiber acoustic sensing technology, this paper briefly expounds the properties of distributed optical fiber acoustic sensor and the principle of injection profile logging, systematically introduces the processing of distributed optical fiber acoustic data, and emphatically introduces the accuracy of k-means clustering algorithm for analyzing distributed optical fiber acoustic signal and qualitative judgment of production layer, which provides a new idea for judging the accura
This study investigates to evaluate feasibility of k-means clustering algorithm in order to improve effectiveness of the results recommended by RICEST Journal Finder System. More than 15,000 papers published in filed ...
详细信息
This study investigates to evaluate feasibility of k-means clustering algorithm in order to improve effectiveness of the results recommended by RICEST Journal Finder System. More than 15,000 papers published in filed of engineering journals during 2013-2017 were collected from their websites. Their titles, abstracts and keywords were extracted, normalized and processed in order to form the test body. According to the number of papers collected, using Cochran's formula, 400 papers completely relevant to the subject of each journal were randomly and proportionally selected and entered the system as queries in order to receive the journals recommended by the system before and after k-means clustering algorithm and the results were recorded. Finally, effectiveness of the system results was determined at each stage by leave-one-out cross validation method based on precision at k top ranked results. Also, opinions of subject reviewers on relevance of the target journal were investigated through a questionnaire. Results showed that before data clustering, only 40% of target journal was recommended at the first 3 ranks. But after k-means clustering algorithm, in more than 80% of searches, the target journal was retrieved at the first 3 ranks. Also, effectiveness of the recommendations, according to 210 subject reviewers, after k-means clustering algorithm, showed that more than 80% of the recommended journals are completely relevant to the given paper. According to the study results, data clustering can significantly increase effectiveness of the results recommended by journal recommender systems.
The process of partitioning a large set of patterns into disjoint and homogeneous clusters is fundamental in knowledge acquisition. It is called clustering in the literature and it is applied in various fields includi...
详细信息
The process of partitioning a large set of patterns into disjoint and homogeneous clusters is fundamental in knowledge acquisition. It is called clustering in the literature and it is applied in various fields including data mining, statistical data analysis, compression and vector quantization. The k-means is a very popular algorithm and one of the best for implementing the clustering process. The k-means has a time complexity that is dominated by the product of the number of patterns, the number of clusters, and the number of iterations. Also, it often converges to a local minimum. In this paper, we present an improvement of the k-means clustering algorithm, aiming at a better time complexity and partitioning accuracy. Our approach reduces the number of patterns that need to be examined for similarity, in each iteration, using a windowing technique. The latter is based on well known spatial data structures, namely the range tree, that allows fast range searches. (C) 2002 Elsevier Science (USA).
In various application domains such as website, education, crime prevention, commerce, and biomedicine, the volume of digital data is increasing rapidly. The trouble appears when retrieving the data from the storage m...
详细信息
In various application domains such as website, education, crime prevention, commerce, and biomedicine, the volume of digital data is increasing rapidly. The trouble appears when retrieving the data from the storage media because some of the existing methods compare the query image with all images in the database;as a result, the search space and computational complexity will increase, respectively. The content-based image retrieval (CBIR) methods aim to retrieve images accurately from large image databases similar to the query image based on the similarity between image features. In this study, a new hybrid method has been proposed for image clustering based on combining the particle swarm optimization (PSO) with k-means clustering algorithms. It is presented as a proposed CBIR method that uses the color and texture images as visual features to represent the images. The proposed method is based on four feature extractions for measuring the similarity, which are color histogram, color moment, co-occurrence matrices, and wavelet moment. The experimental results have indicated that the proposed system has a superior performance compared to the other system in terms of accuracy.
Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse confo...
详细信息
Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved k-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the k-meansalgorithm is proposed to improve traditional k-meansclustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved k-meansalgorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved k-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional k-meansalgorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved k-meansalgorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new k-meansalgorithm may be applied to other areas of bioinformatics resea
Through the spectrum noise logging technology, the oil field is dynamically monitored, and according to its simple logging instrument and convenient operation, the position of the outer channeling of the casing can be...
详细信息
Through the spectrum noise logging technology, the oil field is dynamically monitored, and according to its simple logging instrument and convenient operation, the position of the outer channeling of the casing can be qualitatively judged by the abnormal noise of the measurement record, and the downhole production status of the water injection well can be accurately diagnosed. Fully grasp the problems of oil casing leakage, outer channeling and packer leakage in water injection wells, and enrich downhole operations. In this paper, the downhole noise signal data are standardized, and the k-means clustering algorithm is used to classify the downhole noise signal according to the correlation coefficient of different frequencies to obtain the low-frequency noise signal, and the low-frequency noise signal is clustered twice to obtain the channeling frequency band and the reservoir fluid frequency band. The accurate channeling frequency range is determined and conforms to the domestic and foreign research data. The channeling frequency band is processed by wavelet threshold, and the useless noise in the channeling frequency band is eliminated. The channeling noise signal curve after processing is analyzed, and the main output layers have an obvious amplitude back channeling. The k-means clustering algorithm is used to analyze the channeling frequency band, and the channeling noise is processed by wavelet threshold. It is a new noise signal curve processing method, which provides a new idea for the spectrum noise logging technology to master the problem of channeling outside the pipe in the water injection well.
A method of recognizing 16QAM signal based on k-means clustering algorithm is proposed to mitigate the impact of transmitter finite extinction ratio. There are pilot symbols with 0.39% overhead assigned to be regarded...
详细信息
A method of recognizing 16QAM signal based on k-means clustering algorithm is proposed to mitigate the impact of transmitter finite extinction ratio. There are pilot symbols with 0.39% overhead assigned to be regarded as initial centroids of k-means clustering algorithm. Simulation result in 10 GBaud 16QAM system shows that the proposed method obtains higher precision of identification compared with traditional decision method for finite ER and IQ mismatch. Specially, the proposed method improves the required OSNR by 5.5 dB, 4.5 dB, 4 dB and 3 dB at FEC limit with ER= 12 dB, 16 dB, 20 dB and 24 dB, respectively, and the acceptable bias error and IQ mismatch range is widened by 767% and 360% with ER = 16 dB, respectively. (C) 2017 Elsevier B.V. All rights reserved.
暂无评论