The fault diagnosis of a marine turbocharger system is very crucial for realizing intelligent operation and maintenance in a big data analysis context. In order to improve the diagnostic rate of faults in engineering ...
详细信息
The fault diagnosis of a marine turbocharger system is very crucial for realizing intelligent operation and maintenance in a big data analysis context. In order to improve the diagnostic rate of faults in engineering applications, in this paper, a new unsupervised machine learning algorithm, which is based on one-class support vector machine (OSVM), affinity propagation (AP) and Gaussian mixture model (GMM), called OAGFD is proposed for fault diagnosis. OSVM was firstly used to divide samples of marine turbocharger system into normal and fault samples, and only the fault samples are used in following steps to identify specific fault types. The AP was adopted automatically to provide an initial value for expectation maximization, which can obtain the maximum value of iteration parameters. The GMM is used to classify faults of marine turbocharger system and output the fault diagnosis results. Finally, the OAGFD is validated by actual data. The experiment results show that OAGFD can quickly and accurately be trained. The OAGFD method can achieve higher identification accuracy for multi-faults of marine turbocharger system and takes on faster operation speed and stronger generalization ability than tradition methods. It is an efficient and unsupervised fault diagnosis technique and has both theoretical and practical value. This research provides a new method for automatic fault diagnosis of the marine turbocharger system.
Actigraphy is widely used in sleep studies but lacks a universal unsupervised algorithm for sleep/wake identification. An unsupervised algorithm is useful in large-scale population studies and in cases where polysomno...
详细信息
Actigraphy is widely used in sleep studies but lacks a universal unsupervised algorithm for sleep/wake identification. An unsupervised algorithm is useful in large-scale population studies and in cases where polysomnography (PSG) is unavailable, as it does not require sleep outcome labels to train the model but utilizes information solely contained in actigraphy to learn sleep and wake characteristics and separate the two states. In this study, we proposed a machine learning unsupervised algorithm based on the Hidden Markov Model (HMM) for sleep/wake identification. The proposed algorithm is also an individualized approach that takes into account individual variabilities and analyzes each individual actigraphy profile separately to infer sleep and wake states. We used Actiwatch and PSG data from 43 individuals in the Multi-Ethnic Study of Atherosclerosis study to evaluate the method performance. Epoch-by-epoch comparisons and sleep variable comparisons were made between our algorithm, the unsupervised algorithm embedded in the Actiwatch software (AS), and the pre-trained supervised UCSD algorithm. Using PSG as the reference, the accuracy was 85.7% for HMM, 84.7% for AS, and 85.0% for UCSD. The sensitivity was 99.3%, 99.7%, and 98.9% for HMM, AS, and UCSD, respectively, and the specificity was 36.4%, 30.0%, and 31.7%, respectively. The Kappa statistic was 0.446 for HMM, 0.399 for AS, and 0.311 for UCSD, suggesting fair to moderate agreement between PSG and actigraphy. The Bland-Altman plots further show that the total sleep time, sleep latency, and sleep efficiency estimates by HMM were closer to PSG with narrower 95% limits of agreement than AS and UCSD. All three methods tend to overestimate sleep and underestimate wake compared to PSG. Our HMM approach is also able to differentiate relatively active and sedentary individuals by quantifying variabilities in activity counts: individuals with higher estimated activity variabilities tend to show more frequent sedentar
The automatic detection of relevant reviews plays a major role in tasks such as opinion summarization, opinion-based recommendation, and opinion retrieval. Supervised approaches for ranking reviews by relevance rely o...
详细信息
ISBN:
(纸本)9781450349512
The automatic detection of relevant reviews plays a major role in tasks such as opinion summarization, opinion-based recommendation, and opinion retrieval. Supervised approaches for ranking reviews by relevance rely on the existence of a significant, domain-dependent training data set. In this work, we propose MRR (Most Relevant Reviews), a new unsupervised algorithm that identifies relevant revisions based on the concept of graph centrality. The intuition behind MRR is that central reviews highlight aspects of a product that many other reviews frequently mention, with similar opinions, as expressed in terms of ratings. MRR constructs a graph where nodes represent reviews, which are connected by edges when a minimum similarity between a pair of reviews is observed, and then employs PageRank to compute the centrality. The minimum similarity is graph-specific, and takes into account how reviews are written in specific domains. The similarity function does not require extensive pre-processing, thus reducing the computational cost. Using reviews from books and electronics products, our approach has outperformed the two unsupervised baselines and shown a comparable performance with two supervised regression models in a specific setting. MRR has also achieved a significantly superior run-time performance in a comparison with the unsupervised baselines.
Determining molecular activity from large and high-complexity drug molecules is a challenging task. Many methods have been tried to solve this difficult problem. However, there is not a comprehensive evaluation that c...
详细信息
Determining molecular activity from large and high-complexity drug molecules is a challenging task. Many methods have been tried to solve this difficult problem. However, there is not a comprehensive evaluation that covers the performance test of various molecular descriptors based on supervised and unsupervised methods. In practical application, unlabeled dataset is easier to get than labeled data, however, which cannot be a factor that has great influence on test results. Usually, Support Vector Machine (SVM) and Artificial Neural Network (ANN) are two common traditional supervised algorithms. Sparse AutoEncode (SAE) and Deep Belief Nets (DBN) are two typical unsupervised algorithms. In this paper, the two types of methods were adopted and extensive classification and evaluation strategies were performed to test and verify the IL-1B anti-inflammatory activity. The results reveal no matter what kind of descriptors, the unsupervised algorithm has more precise predictions than the supervised algorithm though the target output is the same. This research contributes to finding the best match algorithm to determine the reliability of drug activity discovery. It not only reduces the cost of research and improves the efficiency of anti-inflammatory drug discovery, but also has a great significance on promoting anti-inflammatory drug design.
Flood mapping plays a crucial role in effective disaster management, risk assessment, and mitigation planning, given the widespread and destructive nature of floods. However, current synthetic aperture radar (SAR)-bas...
详细信息
Flood mapping plays a crucial role in effective disaster management, risk assessment, and mitigation planning, given the widespread and destructive nature of floods. However, current synthetic aperture radar (SAR)-based methods face challenges related to extensive labeled training data, compromised classification accuracy, and limited applicability across different satellite systems and resolutions. In response to these challenges, our research introduces a pioneering unsupervised SAR-based flood mapping algorithm, inspired by artificial general intelligence principles. Notably, the innovative method demonstrates flexibility, performing effectively across various SAR satellites with differing resolutions and sensors, eliminating the need for satellite-specific algorithms. Our algorithm enhances processing speed and scalability by eliminating labor-intensive labeling of training data and manual intervention. To validate its performance, we conducted tests in three distinct regions using meter-level imagery from HISEA-1, Gaofen-3, and Sentinel-1 satellites. Consistently outperformed prevalent unsupervised techniques like Kmeans and Otsu, and even a Supervised-convolutional neural network segmentation algorithm by AI-Earth, with F1-scores exceeding 0.91. This outstanding performance showcases its accuracy, irrespective of the satellite systems or regions utilized. Furthermore, the seamless integration of our algorithm with high-performance cloud computing platforms such as Google Earth Engine enhances its adaptability and scalability, enabling continuous monitoring of global floods. This is crucial in understanding flood trends, assessing their impacts, and formulating effective disaster mitigation strategies.
Predictive algorithms, also known as mathematical models, utilize historical data to accurately predict future outcomes. These algorithms identify patterns and relationships within the data, resulting in precise predi...
详细信息
Predictive algorithms, also known as mathematical models, utilize historical data to accurately predict future outcomes. These algorithms identify patterns and relationships within the data, resulting in precise predictions. The growing importance of predictive algorithms in various domains, such as finance, healthcare, marketing, weather forecasting, E-commerce, etc., has led to an increasing need for robust and accurate models. Machine learning (ML) and deep learning (DL) algorithms, including supervised, unsupervised, & reinforcement learning, play a crucial role in prediction. Supervised algorithms include classification and regression, while unsupervised algorithms primarily focus on clustering. In this study, a detailed comparative analysis of eight classification algorithms, six regression algorithms, and five clustering algorithms is performed using diverse datasets and performance metrics. ROBERTA, ResNet, Random Forest Regression, and K-means clustering algorithms outperformed traditional algorithms in textual classification, image classification, regression, and clustering. This study enables data scientists and practitioners to make informed decisions when selecting appropriate models for their specific applications.
Artificial intelligence technology has proven potential and effective in traffic identification for network management and security. However, the accuracy of its identification is easily influenced by the massive unkn...
详细信息
Artificial intelligence technology has proven potential and effective in traffic identification for network management and security. However, the accuracy of its identification is easily influenced by the massive unknown traffic. Given the fact of numerous unknown applications in real networks and even more as time goes by, a promising traffic identification system should have the ability to discover unknown applications and recognize their traffic. In this paper, an innovative and comprehensive traffic identification system, called STI, is proposed, which can achieve fulfilling high precision both on known and unknown traffic identification. More importantly, STI can self -evolve to identify incoming unknown applications and corresponding training samples based on a novel clustering process with minimal manual involvement. In addition, an improved random forest and a novel similarity calculation method are proposed and applied to STI to enhance the classification performance. Experiments on real network traffic demonstrate the core advantages of the proposed system.
Ship behaviors refer to the operational process such as sailing, entering into port/departure, etc., which indicate by their position, speed, and so on. The collected big data normally have been treated by unsupervise...
详细信息
Ship behaviors refer to the operational process such as sailing, entering into port/departure, etc., which indicate by their position, speed, and so on. The collected big data normally have been treated by unsupervised Machine Learning methods. However, the process is time consuming and lacks consideration of time continuity. From the unknown data to recognize and recur the ship behaviors is still a complex problem. Hence, this study proposes a universal Meta-trajectory Variable Sliding Window (Meta-VSW) method to provide an efficient and high-fidelity solution. In this method, the ship data were connected into the smallest units by the meta-trajectory coding, and combines with variable sliding windows to achieve fast, continuous and accurate recognition of ship behaviors. Taking an inland-water ship and a marine transport ship as examples, the validity of the method was fulfilled and compared with two commonly used algorithms, Affinity Propagation (AP) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). It has the fastest computational speed and can effectively classify the behaviors of massive unknown data from different ships. And it has good performance in capturing behavior boundaries, with the recognition accuracy up to 0.9. Then, the method was applied to analyze the operational effectively and fuel consumption.
With the growing use of electronic cash cards, the number of transactions with these cards has also increased rapidly, so the importance of using fraud detection models has been paid attention to by financial organiza...
详细信息
With the growing use of electronic cash cards, the number of transactions with these cards has also increased rapidly, so the importance of using fraud detection models has been paid attention to by financial organizations from various aspects. Electronic cash card fraud detection models often on a single algorithm, optimization of classifications and clusters, to find fraudulent patterns, which provide unsupervised or supervised methods. But the proposed model will use both unsupervised and supervised methods to detect fraud so that it will take advantage of the advantages of both methods. In the proposed method, by selecting the most important features of users' behavioral patterns such as transaction time and values, their behavioral modeling is done which includes extracting different profiles of users and determining threshold values for each profile. The proposed model will work in real-time by combining two filters to detect electronic card fraud. The first filter is a fast filter that includes a number of unsupervised algorithms, but the second filter is an explicit filter that consists of a number of supervised algorithms. The proposed model creates a profile of the cardholder and measures the degree of deviation of the cardholder's behavior pattern in new transactions through the Map/Reduce approach for parallel execution alongside the human observer. After then the transaction has been completed and the maximum difference between two consecutive orders of observations, the fraud or non-fraud label will be applied to the transaction and added to the relevant database for future use, in order to detect the deviation of the transaction. According to the simulation results of the proposed model, the accuracy criterion with All Variables, reducing the dimensions of PCA and LASSO is 0.985, 0.987, and 0.980, respectively. F1-Score criterion with All Variables, PCA, and LASSO dimension reduction will be 0.681, 0.676, and 0.669 respectively. The simulation results
Rapid and sensitive bioburden detection is of paramount importance in different applications including public health, and food and water safety. To overcome the traditional limitations of bacterial detection i.e., len...
详细信息
Rapid and sensitive bioburden detection is of paramount importance in different applications including public health, and food and water safety. To overcome the traditional limitations of bacterial detection i.e., lengthy culture time, and complicated procedure, a low-cost, portable multichannel fluorometer coupled with machine learning (ML) has been implemented in this study. Five different strains of bacterial samples were tested along with the negative control for time-series fluorescence data collection and analysis. We applied different conventional unsupervised and supervised machine learning techniques with extracted features followed by preprocessing of the data. Initially, machine learning algorithms were applied for the qualitative detection of bacteria by binary classification followed by regression analysis to predict the level of contamination for E. coli. The multiclass classification was used to identify gram-positive, and gram-negative bacterial strains and differentiate all the bacterial strains tested. Our results show that around 97.9% accuracy can be achieved for bacterial contamination detection for as low as 1 CFU/mL while 92.1% accuracy can be achieved for differentiating the gram-positive and gram-negative strains. Additionally, with 1 minute of data, high accuracy is obtained for detecting bioburden, proving the multichannel fluorometer's rapid detection capability. The multichannel fluorometer integrated with ML analytics is capable of automating data analysis and determining accurate and rapid bacterial detection on-site with the prediction of bioburden levels and differentiating bacterial strains and the protocol can be applied to the biosensors with a similar data type.
暂无评论