Six quantitative indicators of asymmetry of human gait, whose values can be estimated by processingdata acquired using depth sensors, are considered. Procedures for the estimation of these indicators have been develo...
Six quantitative indicators of asymmetry of human gait, whose values can be estimated by processingdata acquired using depth sensors, are considered. Procedures for the estimation of these indicators have been developed and tested using measurement data representative of typical gait and two types of asymmetric pathological gait. The results of the completed experiments indicate that among the considered indicators, the one based on the quasi-correlation between the speed of the feet allows for best differentiating among the three considered types of gait.
The rise of the sports industry, which over time has increased in popularity along with machine learning and the possibilities for improving upon previously known and used methods, can serve many future predictions an...
详细信息
Humanleukocyte antigen (HLA) is a molecule that exists on the surface of most human cells and is capable of recognizing and binding to foreign peptides, triggering an immune response. Predicting the binding of peptide...
详细信息
ISBN:
(数字)9789819947492
ISBN:
(纸本)9789819947485;9789819947492
Humanleukocyte antigen (HLA) is a molecule that exists on the surface of most human cells and is capable of recognizing and binding to foreign peptides, triggering an immune response. Predicting the binding of peptides to HLA(pHLA) is crucial for screening effective immune therapy antigen targets. However, little attention has been paid to the relationship and comparative information between positive and negative samples. In this paper, we propose an attention-based contrastive learning model, ACLPHLA, for inferring pHLA binding specificity. We use a Transformer encoder to convert peptides into latent representations, and then mask a portion of the amino acids based on attention weights to generate their contrastive views. Compared to a fully supervised baseline model, we demonstrate that large-scale peptide sequence pre-training based on contrastive learning significantly improves the sequence representation and downstream task prediction performance. We explore different masking strategies, among which masking a certain percentage of amino acids with lower attention weights exhibits the best performance. Comparative experiments on two independent datasets show that our method outperforms other existing algorithms. In addition, our statistical analysis of attention weights reveals important amino acids and their position preferences in pHLA binding, demonstrating the potential interpretability of our proposed model.
Currently, smart systems are widely used in various scenarios such as smart cities, smart healthcare and smart logistics. Meanwhile, centralized cloud computing has become the mainstream platform for smart systems due...
Currently, smart systems are widely used in various scenarios such as smart cities, smart healthcare and smart logistics. Meanwhile, centralized cloud computing has become the mainstream platform for smart systems due to the strong demand for huge data storage and processing capacities. Edge computing can effectively reduce network latency and improve service response time for smart systems. However, edge servers are usually limited with data storage and processing capacities. Therefore, how to utilize both cloud and edge servers effectively is an open issue. In this paper, we focus on the problem of data placement in an edge-cloud collaborative smart system. Specifically, we propose a Differential Evolution Particle Swarm Optimization based data placement strategy (called DE-PSO) to optimize the data storage cost under deadline constraints. DE-PSO considers different features, requirements and environments of edge and cloud data storage services to generate an optimized data placement scheme. The comprehensive experimental results show that the proposed strategy proves its superior performance compared with several representative data placement strategies.
Any alterations in the air will only have an impact on health. Recently, the pollutants such as PM2.5, PM10, NO, NO2, NOx, NH3, CO, SO2, O3, Benzene, Toluene, and Xylene are present in the air. These pollutants exist ...
Any alterations in the air will only have an impact on health. Recently, the pollutants such as PM2.5, PM10, NO, NO2, NOx, NH3, CO, SO2, O3, Benzene, Toluene, and Xylene are present in the air. These pollutants exist in the air have a negative impact on human health. Early pollution detection methods help to determine which gas combinations are unsafe for individuals in order to identify the air pollutant factors. To determine the correlation factor, this study employs many correlation variables on various air pollution attribute combinations. These significant correlation variables employ prediction methods such as multiple regression, logistic regression, and SVM algorithms.
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions...
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions. General data Protection Regulation (GDPR) suggests pseudonymization as a solution to secure open access to research data, but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data. This paper outlines a research agenda within pseudonymization, namely need of studies into the effects of pseudonymization on unstructured data in relation to e.g. readability and language assessment, as well as the effectiveness of pseudonymization as a way of protecting writer identity, while also exploring different ways of developing context-sensitive algorithms for detection, labelling and replacement of personal information in unstructured data. The recently granted project on pseudonymization ‘Grandma Karl is 27 years old’ 1 addresses exactly those challenges. 1. https://***/en/projects/mormor-karl
There are many processes of water treatment in waterworks, among which coagulation and precipitation plays an important role, which is directly related to the guarantee of water quality. The traditional coagulant dosi...
There are many processes of water treatment in waterworks, among which coagulation and precipitation plays an important role, which is directly related to the guarantee of water quality. The traditional coagulant dosing technology relies too much on manual labor, has poor stability and high drug consumption. The rapid development of artificial intelligence and machine learning provides a new way to solve the coagulant prediction problem. Combined with the advanced theory of artificial intelligence, a prediction method of coagulant based on multi-model fusion Stacking ensemble learning approach is proposed in this paper. Considering the differences in data observation and training principles of different algorithms, the advantages of each model are fully utilized to build a Stacking ensemble learning model for coagulant prediction with multiple machine learning algorithms embedded, and the base learners of the model contain XGBoost and SVR. dataprocessing goes through steps such as missing value filling, outlier detection, data conversion, and combining with business scenarios for feature construction, feature selection, etc. Finally, the validity of the algorithm was verified using the production data of a water plant. The prediction results show that the coagulant prediction method based on Stacking ensemble learning approach with multi-model fusion has higher prediction accuracy compared with the traditional single-model prediction.
Despite many generous grants and discounts provided by major suppliers of cloud resources for academia and/or research teams, the cost incurred by the deployment and operationalization of cloud storage and processing ...
Despite many generous grants and discounts provided by major suppliers of cloud resources for academia and/or research teams, the cost incurred by the deployment and operationalization of cloud storage and processing is far from trivial. Hybrid cloud open-source solutions such as OpenStack can prove extremely important for institutions lacking generous financial resources. This is the case with the RaaS-IS platform installed at Alexandru Ioan Cuza University of Iasi which hosts a large variety of research projects on Big data, Machine Learning, Text Mining, Molecular Biology, etc. developed by research teams from various institutions. This paper introduces a technical solution built on OpenStack, designed to assess the performance of dataprocessing—specifically SQL queries—across a broad spectrum of Big data tools, such as Apache Hadoop and Spark. The aim of the solution is to evaluate SQL query performance as data volume scales from hundreds of gigabytes to terabytes. The evaluation considers various query parameters, including the number of joins specified in the FROM clause, predicate filters in the WHERE clause, attributes in GROUP BY, conditions in the HAVING clause and many more.
Clustering is a basic strategy in data mining, allowing the detection of latent patterns within complex datasets. However, when dealing with the datasets composed of a variety of properties, traditional clustering alg...
Clustering is a basic strategy in data mining, allowing the detection of latent patterns within complex datasets. However, when dealing with the datasets composed of a variety of properties, traditional clustering algorithms frequently find challenges. Also, the data gets separated into different clusters in the hard clustering paradigm, with each data clearly associated with a specific cluster. On the other hand, Rough Clustering outperforms these constraints by allowing data points to belong to many clusters at the same time and form connections with components of other clusters. The incorporation of decision theory allows to offer an entirely new trajectory for the creation of clusters. This advancement intends to provide the clustering process with an inherent adaptive nature, allowing the algorithms to res pond to data patterns dynamically and optimize cluster allocations. This research study provides a coherent and systematic methodology for cluster evolution through the perspective of decision theory, thereby contributing to the advancement of clustering techniques.
With the popularization of mobile applications and the timely acquisition of fresh data, real-time clustering and its evolution analysis have become the primary operations for dataprocessing and knowledge discovery. ...
详细信息
ISBN:
(纸本)9798350317152
With the popularization of mobile applications and the timely acquisition of fresh data, real-time clustering and its evolution analysis have become the primary operations for dataprocessing and knowledge discovery. Such continuous queries on massive objects are computation-intensive tasks in dynamic scenarios. However, existing clustering techniques are incompetent to achieve decent performance when computation-intensive operations frequently occur in streaming scenarios, which is caused by two challenges: (i) uncertainty of the clustering frequency;(ii) unpredictable distribution evolution. Hence, it is critical to find a lightweight model that can cluster the high-speed dynamic instances while exploiting the evolution amid different clustering results. This paper focuses on the problem of real-time clustering on streaming data in computation-intensive and high-dynamics tasks, through a framework Ocean, consisting of the Online clustering algorithm and evolution analysis. Particularly, the framework conceives a flexible composite window to augment the knowledge mining, achieving a proper real-time response in various scenarios. The evolution analysis supports full life-cycle detection, improving the adaptability to dynamic concept drifts and multiple patterns. Inspired by the grid partition strategy, this framework adopts grid feature vectors to capture the significant changes in streaming data. Furthermore, we propose an optimization that removes sparse grids timely and performs the online clustering adaptively for space and time efficiency. It is proven to be effective both theoretically and experimentally. This strategy enables real-time clustering for dynamic streaming data without degrading the clustering quality or increasing the computation cost. Experiments on real datasets and synthetic datasets verify the accuracy and effectiveness of Ocean compared to the state-of-the-art approaches, as well as the superior ability to perform clustering in a real-time
暂无评论