检索结果-内蒙古大学图书馆

sampling algorithms in Statistical Physics: A Guide for Statistics and Machine Learning

STATISTICAL SCIENCE 2024年第1期39卷 137-164页

作者： Faulkner, Michael F. Livingstone, Samuel Univ Bristol HH Wills Phys Lab Bristol England UCL Dept Stat Sci London England

We discuss several algorithms for sampling from unnormalized probability distributions in statistical physics, but using the language of statistics and machine learning. We provide a self-contained introduction to some key ideas and concepts of the field, before discussing three well-known problems: phase transitions in the Ising model, the melting transition on a two-dimensional plane and simulation of an all -atom model for liquid water. We review the classical Metropolis, Glauber and molecular dynamics sampling algorithms before discussing several more recent approaches, including cluster algorithms, novel variations of hybrid Monte Carlo and Langevin dynamics and piece -wise deterministic processes such as event chain Monte Carlo. We highlight cross -over with statistics and machine learning throughout and present some results on event chain Monte Carlo and sampling from the Ising model using tools from the statistics literature. We provide a simulation study on the Ising and XY models, with reproducible code freely available online, and following this we discuss several open areas for interaction between the disciplines that have not yet been explored and suggest avenues for doing so.

关键词： Statistical physics sampling algorithms Markov chain Monte Carlo Ising model Potts model XY model hard-disk model molecular simulation Metropolis Glauber dynamics molecular dynamics hybrid Monte Carlo Langevin dynamics event chain Monte Carlo

来源：评论

学校读者我要写书评

暂无评论

Stroke Risk Prediction: Comparing Different sampling algorithms

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2023年第6期14卷 1074-1081页

作者： Yin, Qiuyang Ye, Xiaoyan Huang, Binhua Qin, Lei Ye, Xiaoying Wang, Jian Software Engn Inst Guangzhou Dept Network Technol Guangzhou 310401 Peoples R China Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China Univ Sains Malaysia Sch Comp Sci George Town 11800 Malaysia Neusoft Inst Guangdong Sch Comp Sci Foshan 528225 Peoples R China

stroke is a serious disease that has a significant impact on the quality of life and safety of patients. Accurately predicting stroke risk is of great significance for preventing and treating stroke. In the past few years, machine learning methods have shown potential in predicting stroke risk. However, due to the imbalance of stroke data and the challenges of feature selection and model selection, stroke risk prediction still faces some *** article aims to compare the performance differences between different sampling algorithms and machine learning methods in stroke risk prediction. This study used the over-sampling algorithm (Random Over sampling and SMOTE), the under-sampling algorithm (Random Under sampling and ENN), and the hybrid sampling algorithm (SMOTE-ENN), and combined them with common machine learning methods such as K-Nearest Neighbors, Logistic Regression, Decision Tree and Support Vector Machine to build the prediction *** the analysis of experimental results, and found that the SMOTE combined with the LR model showed good performance in stroke risk prediction, with a high F1 score. In addition, this study found that the overall performance of the undersampling algorithm is better than that of the oversampling and hybrid sampling *** research results provide useful references for predicting stroke risk and provide a foundation for further research and application. Future research can continue to explore more sampling algorithms, machine learning methods, and feature engineering techniques to further improve the accuracy and interpretability of stroke risk prediction and promote its application in clinical practice.

关键词： -Stroke prediction data mining machine learning unbalanced data sampling algorithms classification algorithms

来源：评论

学校读者我要写书评

暂无评论

Multi-Objective Optimization of sampling algorithms Pipeline for Unbalanced Problems

Multi-Objective Optimization of Sampling Algorithms Pipeline...

引用

IEEE Congress on Evolutionary Computation (CEC)

作者： Miranda, Pericles B. C. Mello, Rafael Ferreira Nascimento, Andre C. A. Si, Tapas Univ Fed Rural Pernambuco Dept Comp Recife PE Brazil CESAR Sch Recife PE Brazil

ISBN: (数字)9781665467087

ISBN: (纸本)9781665467087

The sequencing of sampling algorithms has shown to be a promising approach in generating balanced versions of unbalanced data. Sequencing allows different algorithms of under-sampling and/or over-sampling to be performed in sequence, producing a resulting balanced database. However, defining the most appropriate sequence of sampling algorithms is challenging. This article treats the sequencing problem as a combinatorial optimization task and proposes a multi-objective optimization method to seek promising solutions that maximize the performance of classifiers both in accuracy and in F-1-score. The results showed that the proposed method was capable of finding optimized sequences that improved the performance of the classifiers, obtaining statistically better results, mainly in F-1-score, when compared with competing methods, in most of the selected unbalanced problems.

关键词： sampling algorithms Multi-objective Optimization Evolutionary algorithms Unbalanced Problems

来源：评论

学校读者我要写书评

暂无评论

On adaptive sampling algorithms for IoT devices

On adaptive sampling algorithms for IoT devices

引用

IEEE International Conference on Communications (ICC)

作者： Ben-Aboud, Yassine Licea, Daniel Bonilla Ghogho, Mounir Kobbane, Abdellatif Int Univ Rabat Coll Engn & Architecture TICLab Rabat Morocco Mohammed V Univ Rabat ENSIAS Rabat Morocco Univ Leeds Fac Engn Leeds W Yorkshire England Czech Tech Univ Dept Cybernet Prague Czech Republic

ISBN: (纸本)9781728171227

sampling is a core process in IoT systems. It determines the data volume circulating within the network as well as the energy consumption on the IoT devices. Adaptive sampling aims to control the volume of generated data to reduce energy and bandwidth consumption without undermining data quality. Within this context, we propose two new adaptive sampling techniques: a light-weight adaptive sampling algorithm and an optimized uniform sampling method. We tested our methods using various real data-sets and compared their performances against state-of-the-art adaptive sampling algorithms in terms of data quality and data volume. The results show that the proposed methods are consistently among the best with a noticeable reduction in computational load.

关键词： Adaptive sampling IoT sampling algorithms

来源：评论

学校读者我要写书评

暂无评论

Identification of Suitable Technologies for Drinking Water Quality Prediction: A Comparative Study of Traditional, Ensemble, Cost-Sensitive, Outlier Detection Learning Models and sampling algorithms

引用

ACS ES&T WATER 2021年第8期1卷 1676-1685页

作者： Chen, Xingguo Liu, Houtao Xu, Xiuying Zhang, Luoyuan Lin, Tianchi Zuo, Min Huang, Yichao Shen, Ruqin Chen, Da Deng, Yongfeng Nanjing Univ Posts & Telecommun Jiangsu Key Lab Big Data Secur & Intelligent Proc Nanjing 210023 Jiangsu Peoples R China Beijing Technol & Business Univ Natl Engn Lab Agriprod Qual Traceabil Beijing 100048 Peoples R China Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Jiangsu Peoples R China Anhui Med Univ Dept Toxicol Sch Publ Hlth Hefei 230032 Anhui Peoples R China Jinan Univ Sch Environm Guangzhou Key Lab Environm Exposure & Hlth Guangzhou 510632 Guangdong Peoples R China Jinan Univ Guangdong Key Lab Environm Pollut & Hlth Guangzhou 510632 Guangdong Peoples R China Nanjing Univ Sch Environm State Key Lab Pollut Control & Resource Reuse Nanjing 210023 Jiangsu Peoples R China

Drinking water quality data sets used in learning models have been highly imbalanced, which has weakened the prediction ability of models for drinking water quality. Although some efforts have been made to address the issue of imbalance, little is known about the suitable technologies for drinking water quality prediction. Here, a total of 16 common learning models were applied individually to compare the drinking water quality prediction performance based on a large-scale highly imbalanced drinking water quality data set. Our results showed that ensemble, cost-sensitive learning models with higher F1-scores were more suitable for predicting drinking water quality, compared to other models tested in this study. In addition, the learning model performance could be enhanced by the introduction of two mainstream sampling algorithms [synthetic minority oversampling technique (SMOTE) combined with the Tomek links technique (TLTE) or the edited nearest neighbor technique (ENNTE), SMOTE + TLTE or SMOTE + ENNTE, respectively]. In particular, the F1-scores of deep cascade forest (DCF) with SMOTE + TLTE or SMOTE + ENNTE reached 94.54 +/- 2.51% and 94.68 +/- 2.72%, respectively. As a consequence, DCF with these two sampling algorithms has greater potential to be applied in drinking water quality monitoring and prediction, as well as other fields that have suffered from issues of imbalanced data.

关键词： drinking water quality prediction imbalance issue learning model sampling algorithms deep cascade forest

来源：评论

学校读者我要写书评

暂无评论

Probabilistic techniques for analyzing sampling algorithms and the dynamics of lattice models

Probabilistic techniques for analyzing sampling algorithms a...

引用

作者： Fahrbach, Matthew Georgia Institute of Technology

学位级别：博士

Statistical mechanics bridges the fields of physics and probability theory, providing critical insights into both disciplines. Statistical physics models capture key features of macroscopic phenomena and consist of a set of configurations satisfying various constraints. Markov chain Monte Carlo algorithms are often used to sample from distributions over the exponentially large state space of these models to gain insight about the system and estimate its thermodynamic properties. Similar problems arise throughout machine learning, optimization, and counting complexity. In this dissertation, we present several new techniques based on random walks for analyzing sampling algorithms and the dynamics of various lattice models from statistical physics. We start by investigating the mixing time of Glauber dynamics for the six-vertex model in its ordered phases. We show that for every Boltzmann weight in the ferroelectric phase, there exist boundary conditions such that local Markov chains require exponential time to converge to equilibrium. This is the first rigorous result about the mixing time of Glauber dynamics for the six-vertex model in the ferroelectric phase. We also analyze the Glauber dynamics with free boundary conditions in the antiferroelectric phase and significantly extend the region for which local Markov chains are known to be slow mixing. In separate lines of work, we use techniques from the theory of random walks and electrical networks to give nearly tight bounds for the transience class of the Abelian sandpile model, closing an open problem of Babai and Gorodezky. The Abelian sandpile model is the canonical dynamical system used to study the phenomenon of self-organized criticality, and the transience class measures the time needed for the process to reach steady-state behavior. We also explore a new approach for approximately sampling elements with fixed rank from graded posets that relies solely on the mixing time of biased Markov chains. This allows

关键词： Lattice models Markov chain Monte Carlo sampling algorithms Statistical physics

来源：评论

学校读者我要写书评

暂无评论

A Meta-Learning Method to Select Under-sampling algorithms for Imbalanced Data Sets 5

A Meta-Learning Method to Select Under-Sampling Algorithms f...

引用

5th Brazilian Conference on Intelligent Systems (BRACIS)

作者： de Morais, Romero F. A. B. Miranda, Pericles B. C. Silva, Ricardo M. A. Univ Fed Pernambuco Recife PE Brazil

ISBN: (纸本)9781509035663

Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges, as common classifiers assume a balanced distribution of examples' classes in the data. sampling techniques overcome the imbalance in the data by modifying the examples' classes distribution. Unfortunately, selecting a sampling technique together with its parameters is still an open problem. Current solutions include the brute-force approach (try as many techniques as possible), and the random search approach (choose the most appropriate from a random subset of techniques). In this work, we propose a new method to select sampling techniques for imbalanced data sets. It uses Meta-Learning and works by recommending a technique for an imbalanced data set based on solutions to previous problems. Our experimentation compared the proposed method against the brute-force approach, all techniques with their default parameters, and the random search approach. The results of our experimentation show that the proposed method is comparable to the brute-force approach, outperforms the techniques with their default parameters most of the time, and always surpasses the random search approach.

关键词： Meta-learning Algorithm selection sampling algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast and Perfect sampling of Subgraphs and Polymer Systems

引用

ACM TRANSACTIONS ON algorithms 2024年第1期20卷 1-30页

作者： Blanca, Antonio Cannon, Sarah Perkins, Will Penn State Univ Comp Sci & Engn Dept Westgate Bldg University Pk PA 16801 USA Claremont Mckenna Coll Math Sci Dept 888 Columbia Ave Claremont CA 91711 USA Georgia Inst Technol Sch Comp Sci 266 Ferst Dr Atlanta GA 30332 USA

We give an efficient perfect sampling algorithm for weighted, connected induced subgraphs (or graphlets) of rooted, bounded degree graphs. Our algorithm utilizes a vertex-percolation process with a carefully chosen rejection filter and works under a percolation subcriticality condition. We show that this condition is optimal in the sense that the task of (approximately) sampling weighted rooted graphlets becomes impossible in finite expected time for infinite graphs and intractable for finite graphs when the condition does not hold. We apply our sampling algorithm as a subroutine to give near linear-time perfect sampling algorithms for polymer models and weighted non-rooted graphlets in finite graphs, two widely studied yet very different problems. This new perfect sampling algorithm for polymer models gives improved sampling algorithms for spin systems at low temperatures on expander graphs and unbalanced bipartite graphs, among other applications.

关键词： sampling algorithms subgraphs polymer models spin systems approximate counting

来源：评论

学校读者我要写书评

暂无评论

GAASP: Genetic Algorithm-Based Atomistic sampling Protocol for High-Entropy Materials

引用

MATERIALS AND MANUFACTURING PROCESSES 2023年第16期38卷 2044-2050页

作者： Anand, G. Indian Inst Engn Sci & Technol Dept Met & Mat Engn Howrah India

High-entropy materials are composed of multiple elements on comparatively simpler lattices. Due to the multi-component nature of such materials, atomic-scale sampling is computationally expensive due to the combinatorial complexity. This study proposes a genetic algorithm-based methodology for sampling such complex chemically disordered materials. Genetic Algorithm-based Atomistic sampling Protocol (GAASP) variants can generate low as well as high-energy structures. GAASP low-energy variant in conjugation with metropolis criteria avoids premature convergence as well as ensures detailed balance condition. GAASP can be employed to generate low-energy structures for thermodynamic predictions, and diverse structures can be generated for machine-learning applications.

关键词： high-entropy alloys genetic algorithm thermodynamics sampling algorithms machine learning

来源：评论

学校读者我要写书评

暂无评论

Optimal mixing via tensorization for random independent sets on arbitrary trees

引用

COMBINATORICS PROBABILITY AND COMPUTING 2024年第2期34卷 259-275页

作者： Efthymiou, Charilaos Hayes, Thomas P. Stefankovic, Daniel Vigoda, Eric Univ Warwick Dept Comp Sci Coventry England Univ Buffalo Dept Comp Sci & Engn New York NY USA Univ Rochester Dept Comp Sci Rochester NY USA Univ Calif Santa Barbara Dept Comp Sci Santa Barbara CA USA

We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, forgenerating a random independent set of a tree. Our focus is obtaining optimal convergence results forarbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter lambda>0;the special case lambda=1 corresponds to the uniform distribution over all independent sets. Previous work of Martinelli, Sinclair and Weitz(2004) obtained optimal mixing time bounds for the complete Delta-regular tree for all lambda. However, Restrepo,Stefankovic, Vera, Vigoda, and Yang (2014) showed that for sufficiently large lambda there are bounded-degreetrees where optimal mixing does not hold. Recent work of Eppstein and Frishberg (2022) proved a poly-nomial mixing time bound for the Glauber dynamics for arbitrary trees, and more generally for graphs ofbounded tree-width. We establish an optimal bound on the relaxation time (i.e., inverse spectral gap) ofO(n) for the Glauber dynamics for unweighted independent sets on arbitrary trees. We stress that our results hold for arbitrarytrees and there is no dependence on the maximum degree Delta. Interestingly, our results extend (far) beyondthe uniqueness threshold which is on the order lambda=O(1/Delta). Our proof approach is inspired by recent workon spectral independence. In fact, we prove that spectral independence holds with a constant independentof the maximum degree for any tree, but this does not imply mixing for general trees as the optimal mixingresults of Chen, Liu, and Vigoda (2021) only apply for bounded-degree graphs. We instead utilize thecombinatorial nature of independent sets to directly prove approximate tensorization of variance via anon-trivial inductive proof.

关键词： Markov Chain Monte Carlo mixing time independent sets hard-core model approximate counting algorithms sampling algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：