The insufficient amount of training data is a persisting bottleneck of machinelearning systems. A large portion of the world's data is scattered and locked in data silos. Breaking up these data silos could allevi...
详细信息
ISBN:
(纸本)9780998133171
The insufficient amount of training data is a persisting bottleneck of machinelearning systems. A large portion of the world's data is scattered and locked in data silos. Breaking up these data silos could alleviate this problem. Federated machinelearning is a novel model-to-data approach that enables the training of machinelearning models, on decentralized, potentially siloed data. Despite its promising potential, most Federated machinelearning projects never leave the prototype stage. This can be attributed to exaggerated expectations and an inappropriate fit between the technology and the use case. Current literature does not offer guidance for assessing the fit between Federated machinelearning and their use case. Against this backdrop, we design a decision-support tool to aid decision-makers in the suitability and complexity assessment of FedML projects. Thereby, we aim to facilitate the technology selection process, avoid exaggerated expectations and consequently facilitate the success of Federated machinelearning projects.
Cold rolling chatter is one of the bottlenecks to improve the production quality and efficiency of high-strength thin strip, so it is very important to predict and identify the chatter states. The accumulation of indu...
详细信息
Cold rolling chatter is one of the bottlenecks to improve the production quality and efficiency of high-strength thin strip, so it is very important to predict and identify the chatter states. The accumulation of industrial data from the rolling process and the development of machinelearning technology have opened up a path to solve this problem. However, due to low density and uneven distribution of actual process data, knowledge learning and states identification of cold rolling chatter phenomena are confined. Therefore, based on the combination of actual production data and simulation data, a novel identification method is proposed and applied to identify the cold rolling chatter states. Firstly, the actual vibration signals are collected and the simulation data generated from chatter model are used to supplement data in chatter states. The sample space is constructed based on the semi-supervised transfer component analysis (SSTCA) to realize the fusion of actual production data and simulation data. Then, different cold rolling states are identified by particle swarm optimization-support vector machine (PSO-SVM) and back propagation neural network (BPNN), respectively. Finally, the identification results of PSO-SVM and BPNN are combined based on the Dempster-Shafer (D-S) theory. It can be drawn that SSTCA can effectively solve the problems of low density and uneven distribution of industrial data by fusion of multi-source data, and D-S theory can realize the connection of different machinelearning methods. Furthermore, the presented method can more accurately identify different chatter states in the rolling process.
Federated learning is a cutting-edge machinelearning framework that enables multiple organizations to model data usage and conduct learning tasks while ensuring user privacy protection, data security, and compliance ...
详细信息
Federated learning is a cutting-edge machinelearning framework that enables multiple organizations to model data usage and conduct learning tasks while ensuring user privacy protection, data security, and compliance with government regulations. In this study, we propose a novel federated learning algorithm, FedMPO, to address the critical challenge of data heterogeneity among clients, which can lead to inconsistent optimized local models. FedMPO is a versatile multi-dimensional loss function that leverages the 3-dimensional proximal operator to fit a stationary and rapidly convergent loss function using Taylor expansion. As a general loss function, FedMPO can be applied to popular federated learning algorithms, such as FedAvg, FedProx, SCAFFOLD, FedDyn, and FedDC, to enhance the accuracy and stability of secure aggregation. Extensive experiments show that FedMPO can improve accuracy scores(almost 0.02-0.33 and 0.02-0.45 percent improvements on full and partial client participation, respectively) on some common evaluation data sets with various settings and also has robust in partial participation settings, non-iid data and heterogeneous clients in the same time.
Liver diseases are a global health concern, and early diagnosis is crucial for effective treatment. While traditional liver-function laboratory tests provide valuable information, they may not say much about any emerg...
详细信息
Liver diseases are a global health concern, and early diagnosis is crucial for effective treatment. While traditional liver-function laboratory tests provide valuable information, they may not say much about any emerging or underlying illnesses. In this study, we explore the efficacy of machinelearning algorithms in predicting the risk of liver disease using the Indian Liver Patient dataset. This could help patients concerned opt for timely and effective treatment.
Engineering interleaves of composite laminates with carbon nanotubes (CNTs) improves interlaminar fracture toughness, creating also conductivity, which can be employed for damage identification. The paper explores mac...
详细信息
Engineering interleaves of composite laminates with carbon nanotubes (CNTs) improves interlaminar fracture toughness, creating also conductivity, which can be employed for damage identification. The paper explores machinelearning (ML) solution of the inverse problem of the defect identification for interleaves with anisotropic conductivity (aligned CNTs). The electrical and geometrical properties of the interleave are assigned based on the synchrotron X-ray computer tomography of glass fibre / epoxy laminates with nanostitch. Several machinelearning (ML) models are applied (XGBoost, fully connected (FCNN) and convolution neural (CNN) networks). XGBoost and FCNN algorithms performed poorly, failing to detect smaller defects and giving significant errors for larger ones. CNN algorithm detects defects well: It predicts the geometric characteristics of the defect with error below 16 %.
data-driven approaches (e.g., machinelearning) are increasingly used to replace or assist laboratory studies in the study of emerging contaminants (ECs). In the past ten years, an increasing number of models or appro...
详细信息
data-driven approaches (e.g., machinelearning) are increasingly used to replace or assist laboratory studies in the study of emerging contaminants (ECs). In the past ten years, an increasing number of models or approaches have been applied to ECs, and the datasets used are continuously enriched. However, there are large knowledge gaps between what we have found and the natural eco-environmental meaning. For most published reviews, the contents are organized by the types of ECs, but the common issues of datascience, regardless of the type of pollutant, are not sufficiently addressed. To close or narrow the knowledge gaps, we highlight the following issues ignored in the field of data-driven EC research. Complicated biological and ecological data and ensemble models revealing mechanisms and spatiotemporal trends with strong causal relationships and without data leakage deserve more attention in the future. In addition, the matrix influence, trace concentration, and complex scenario have often been ignored in previous works. Therefore, an integrated research framework related to natural fields, ecological systems, and large-scale environmental problems, rather than relying solely on laboratory data-related analysis, is urgently needed. Beyond the current prediction purposes, datascience can inspire the discovery of scientific questions, and mutual inspiration among datascience, process and mechanism models, and laboratory and field research is a critical direction. Focusing on the above urgent and common issues related to data, frameworks, and purposes, regardless of the type of pollutant, datascience is expected to achieve great advancements in addressing the eco-environmental risks of ECs.
This study involves a datascience and machinelearning course partnered with an engineering communication course referred to here as an authentically integrated communication model and offers insights into such a mod...
详细信息
ISBN:
(纸本)9798350384468;9798350384451
This study involves a datascience and machinelearning course partnered with an engineering communication course referred to here as an authentically integrated communication model and offers insights into such a model for engineering educators. In these partnered courses, student teams apply datascience and machinelearning tools to conduct data analysis and write two datascience reports. Through qualitative coding and corpus analysis methods, rhetorical moves that students make in the datascience report genre were identified. Twelve out of 57 total final reports were randomly chosen and coded, then six corpora of excerpts related to two codes and subcodes were created to generate keyword lists. These codes were "results," "discussion," as well as "ineffective" and "effective" subcodes fir each main code. The total codes were then compared to one another according to students' enrollment in both courses. Overall, students' reports in the engineering communication class more often contained effective results and effective discussion excerpts. Keywords along with example sentences are provided to demonstrate greater context for the use of language in the datascience report genre.
The digitization of college student management is a crucial approach for training institutions to decrease management costs while enhancing the quality of students' development. In this study, we focused on the st...
详细信息
The digitization of college student management is a crucial approach for training institutions to decrease management costs while enhancing the quality of students' development. In this study, we focused on the students majoring in Computer science in a certain university and conducted an exploration using their scores in multiple undergraduate courses. Initially, we selected the students' basic and core academic courses based on the training program and identified four groups of course combinations with strong positive correlations through correlation and cluster analysis. This finding helped the university optimize the arrangement and structure of the Computer science major's course system. Next, we organized the student overall course performance data in a sequential format based on the semester order. Multiple machinelearning models were utilized to perform regression prediction for student performance and classification prediction tasks to determine the student's performance level. Finally, we integrated multiple machinelearning models to create a practical framework for predicting student academic performance, which can be applied in student digital management. The framework can also provide effective decision support for academic early warning and guide the students' development.
Misinformation detection is a rapidly moving target, as new topics emerge and evolve in high volume on social media platforms. Annotated and fact-checked datasets are necessary for detection model training, but are la...
详细信息
ISBN:
(纸本)9798350364941;9798350364958
Misinformation detection is a rapidly moving target, as new topics emerge and evolve in high volume on social media platforms. Annotated and fact-checked datasets are necessary for detection model training, but are laborious to curate. Thus, many misinformation detection models are trained in low-resource environments and rely on machinelearning techniques to improve performance with small ground-truth datasets. Generative data augmentation methods enable topic-specific examples that increase a model's training dataset without incurring the cost and time investment associated with manual annotation. In this work, we assess the value of using generative augmentation for different classes of learning models: a classic neural model, a fine-tuned deep learning model, a reinforcement learning model, and an active learning model. We find that generated training data is not effective for all learning paradigms for the misinformation detection task, highlighting the need to use different quality measures to assess its value for low-resource machinelearning tasks.
data difficulty level measurement is a critical aspect of machinelearning performance evaluation. Several measures have been used to assess the difficulty level of classifying data points in binary classification. Ho...
详细信息
ISBN:
(纸本)9798350351194;9798350351187
data difficulty level measurement is a critical aspect of machinelearning performance evaluation. Several measures have been used to assess the difficulty level of classifying data points in binary classification. However, these measures typically involve building a machinelearning model first, which is then used to assess the data difficulty level. In this paper, we propose a novel model agnostic measure named as polarized K-entropy to evaluate the difficulty of classifying a data instance. Our measure leverages the computation of entropy based on the nearest neighbors of a data point. We conducted experiments to evaluate the effectiveness of our proposed method by analyzing how the accuracy of machinelearning models change with respect to data difficulty. We used Spearman's rank correlation coefficient to analyze this relationship for neural network, support vector machine, and random forest. Our results show that our measure outperformed the non-conformity measure in all the experiments conducted for six datasets using the selected machinelearning models.
暂无评论