This study explores the application of supervised and unsupervised machinelearning algorithms for predicting the sex of sheep using measurements of the talus bone in archaeozoological research. Leveraging data from w...
详细信息
ISBN:
(纸本)9798350365627;9798350365610
This study explores the application of supervised and unsupervised machinelearning algorithms for predicting the sex of sheep using measurements of the talus bone in archaeozoological research. Leveraging data from well-documented sheep populations, we trained and tested various machinelearning algorithms, such as kNN, SVMs, Decision Trees, Neural Networks, k-Means, DBSCAN, and GMM - demonstrating high accuracy in sex classification across multiple datasets from various time periods. We furthermore evaluate a variety of clustering results on unlabeled data and highlight their respective strengths and drawbacks. Our results suggest that machinelearning offers a promising direction for enhancing the analysis of ancient and recent animal remains, providing valuable insights into past animal husbandry practices and their implications for understanding human history.
In the field of nuclear science, obtaining and utilizing nuclear data, including nuclear reaction data, nuclear structure information, and radioactive decay data, is crucial. Neutron-induced nuclear reactions, particu...
详细信息
In the field of nuclear science, obtaining and utilizing nuclear data, including nuclear reaction data, nuclear structure information, and radioactive decay data, is crucial. Neutron-induced nuclear reactions, particularly nuclear cross sections data, are essential for various applications, including reactor design. The EXFOR database is the only international repository for storing nuclear reaction experimental measurement information and data. However, experimental measurement data are often scarce, subject to discrepancies, or even errors, requiring human evaluation. This process can be prone to biases and significant uncertainties. To address these challenges, this study proposes a novel framework, F eature E ngineering for Nuclear Reaction C ross S ection G eneration using M achine L earning (FECSG-ML), which employs machinelearning methods to generate nuclear cross sections data, serving as a substitute for evaluating nuclear databases. Given the limited size of the EXFOR database, training a model solely on EXFOR data could lead to underfitting. Therefore, the proposed approach utilizes transfer learning, initially pre-training the model using the ENDF/B-VIII.0 dataset and subsequently fine-tuning it with the EXFOR database. This approach ensures high accuracy where real data are available and enables the learning of characteristics of the evaluation dataset where real data are lacking. Moreover, machinelearning techniques are employed to transform discrete nuclear cross sections data into a continuous format, accommodating various isotopes and predicting multiple sets of cross sections data. The framework integrates various machinelearning methods and utilizes ensemble learning for result optimization. Experimental results demonstrate that the regression curves generated by the FECSG-ML model align well with EXFOR data points, outperforming the ENDF/B-VIII.0 evaluation database. Furthermore, the nuclear cross sections data generated by the FECSG-ML model
Electrohydrodynamic (EHD) printing has been used in various applications (e.g., sensors, batteries, photonic crystals). Currently, research on studying the relationships between EHD jetting behaviors, material propert...
详细信息
Electrohydrodynamic (EHD) printing has been used in various applications (e.g., sensors, batteries, photonic crystals). Currently, research on studying the relationships between EHD jetting behaviors, material properties, and processing conditions is still challenging due to a large number of parameters, cost, time, and the complex nature of experiments. In this research, we investigated EHD printing behavior using a machinelearning (ML)-guided approach to overcome limitations in the experiments. Specifically, we investigated two jetting modes and the size of printed material with a broader range of material properties and processing parameters. We used samples from both literature and our own experiment results with different type of materials. Different ML models have been developed and applied to the data. Our results have shown that ML can navigate a vast parameter search space to predict printing behavior with an accuracy of higher than 95% during EHD printing. Moreover, the results showed that ML models can be used to predict the printing behavior and feather size for new materials. The ML models can guide the investigation of EHD printing and helped us understand the printing behavior in a systematic manner with reduced time, cost, and required experiments.
As audio machinelearning outcomes are deployed in societally impactful applications, it is important to have a sense of the quality and origins of the data used. Noticing that being explicit about this sense is not t...
详细信息
ISBN:
(纸本)9798350378443;9798350378450
As audio machinelearning outcomes are deployed in societally impactful applications, it is important to have a sense of the quality and origins of the data used. Noticing that being explicit about this sense is not trivially rewarded in academic publishing in appliedmachinelearning domains, and neither is included in typical appliedmachinelearning curricula, we present a study into dataset usage connected to the top-5 cited papers at the internationalconference on Acoustics, Speech, and Signal Processing (ICASSP). In this, we conduct thorough depth-first analyses towards origins of used datasets, often leading to searches that had to go beyond what was reported in official papers, and ending into unclear or entangled origins. Especially in the current pull towards larger, and possibly generative AI models, awareness of the need for accountability on data provenance is increasing. With this, we call on the community to not only focus on engineering larger models, but create more room and reward for explicitizing the foundations on which such models should be built.
Multimedia applications for machinelearning models are characterized by the fusion of multiple modalities of data. In this work, we highlight the trust and robustness challenges of machinelearning that arises from d...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
Multimedia applications for machinelearning models are characterized by the fusion of multiple modalities of data. In this work, we highlight the trust and robustness challenges of machinelearning that arises from data fusion. To do so, we present three case studies demonstrating how multimedia applications exacerbate existing challenges of trustworthy and robust machinelearning. For the first case study, we investigate the impact of fusion depth on the robustness of multi-modal machinelearning models, observing that model architecture could impact robustness. For the second case study, we investigate the impact of fusion modality on the robustness of multi-modal machinelearning models, observing that fusion models are only as robust as their most susceptible modality. For the third case study, we explore the impact of weight quantization techniques on the robustness of multi-modal models, observing the need for modality-based quantization schemes. Through these case studies, we hope to shed light on the unique trust and security challenges that arise in machinelearning models when applied in multimedia applications and offer insights to fortify such systems in real-world scenarios.
We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machinelearning model. We present a new data selection approach based on k-mea...
详细信息
We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machinelearning model. We present a new data selection approach based on k-means clustering and sensitivity sampling. Assuming access to an embedding representation of the data with respect to which the model loss is Holder continuous, our approach provably allows selecting a set of "typical" k + 1/epsilon(2) elements whose average loss corresponds to the average loss of the whole dataset, up to a multiplicative (1 +/- epsilon) factor and an additive epsilon lambda Phi(k), where Phi(k) represents the k-means cost for the input embeddings and lambda is the Holder constant. We furthermore demonstrate the performance and scalability of our approach on fine-tuning foundation models and show that it outperforms state-of-the-art methods. We also show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performance of leverage score sampling empirically, while being conceptually simpler and more scalable.
Improving the current level of skill in seasonal climate prediction is urgent for achieving sustainable socioeconomic development, and this is especially true in China where meteorological disasters are experienced fr...
详细信息
Improving the current level of skill in seasonal climate prediction is urgent for achieving sustainable socioeconomic development, and this is especially true in China where meteorological disasters are experienced frequently. In this study, based upon big climate data and traditional statistical prediction experiences, a merged machinelearning model(Y-model) was developed to address this, as well as to further explore unknown potential predictors. In Y-model, empirical orthogonal function analysis was firstly applied to reduce the data dimensionality of the target predictand(temperature and precipitation in the four seasons over China). Image recognition techniques were used to automatically identify possible predictors from the big climate data. These predictors, associated with significant circulation anomalies, were recombined into a large ensemble according to different threshold settings for five factors determining the statistical forecast skill. Facebook Prophet was chosen to conduct the independent hindcasts for each season's climate at a lead time of two months. During 2011–2022, the seasonal climate in China was skillfully predicted by Y-model, with an averaged pattern correlation coefficient skill of 0.60 for temperature and 0.24 for precipitation, outperforming CFSv2. Potential predictor analysis for recent extreme events suggested that prior signals from the Indian Ocean and the stratosphere were important for determining the super Mei-yu in 2020, while the prior sea surface temperature over the western Pacific and the soil temperature over West Asia may have contributed to the extreme high temperatures in 2022. Our study provides new insights for seasonal climate prediction in China.
Symbolic machinelearning Prover (SMLP) is a tool and a library for system exploration based on data samples obtained by simulating or executing the system on a number of input vectors. SMLP aims at exploring the syst...
详细信息
ISBN:
(纸本)9783031656262;9783031656279
Symbolic machinelearning Prover (SMLP) is a tool and a library for system exploration based on data samples obtained by simulating or executing the system on a number of input vectors. SMLP aims at exploring the system based on this data by taking a grey-box approach: SMLP uses symbolic reasoning for ML model exploration and optimization under verification and stability constraints, based on SMT, constraint, and neural network solvers. In addition, the model exploration is guided by probabilistic and statistical methods in a closed feedback loop with the system's response. SMLP has been applied in industrial setting at Intel for analyzing and optimizing hardware designs at the analog level. SMLP is a general purpose tool and can be applied to any system that can be sampled and modeled by machinelearning models.
The DEEM'24 workshop (data Management for End-to-End machinelearning) is held on Sunday June 9th, in conjunction with SIGMOD/PODS 2024. DEEM brings together researchers and practitioners at the intersection of ap...
详细信息
ISBN:
(纸本)9798400704222
The DEEM'24 workshop (data Management for End-to-End machinelearning) is held on Sunday June 9th, in conjunction with SIGMOD/PODS 2024. DEEM brings together researchers and practitioners at the intersection of appliedmachinelearning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers (8 pages) describing preliminary and ongoing research results, including industrial experience reports of end-to-end ML deployments, related to DEEM topics. In addition, DEEM 2023 has a category for short papers (4 pages) as a forum for sharing interesting use cases, problems, datasets, benchmarks, visionary ideas, system designs, preliminary results, and descriptions of system components and tools related to end-to-end ML pipelines. This year, the workshop received 16 high-quality submissions on diverse topics relevant to DEEM, of which 8 regular papers and 8 short papers.
In the dynamic era of online education, the pursuit of a personalized and effective learning experience is paramount. A transformative approach in online education by integrating Multimodal data Mining and data Synthe...
详细信息
暂无评论