检索结果-内蒙古大学图书馆

The fundamental role of density functions in the binary classification problem

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2022年第13期92卷 2846-2861页

作者： Martinez-Camblor, Pablo Dartmouth Hitchcock Med Ctr Dept Anesthesiol Lebanon NH 03766 USA Geisel Sch Med Dartmouth Dept Biomed Data Sci 7 Lebanon StSuite 309Hinman Box 7261 Lebanon NH 03751 USA

In biomedicine, binary classification problems are involved in diagnostic but also, for instance, in personalized medicine. The objective is to use information for correctly allocating subjects in groups. Frequently, this information implies high-dimensional data. An adequate classification rule is a trade-off between the sensitivity and the specificity. The ROC curve helps to understand, evaluate and compare the accuracy of classification processes. We propose a procedure for estimating the optimal classification rules based on a penalized estimator of the underlying probability distribution functions. We study its asymptotic properties. Through Monte Carlo simulations, we compare our proposal with a support vector machine-based ROC curve. We illustrate its practical use in a real-world problem. Results suggest that, despite some techniques promise to improve the results provided by traditional methods, in the binary classification problem, the limit is the actual relationship among the density functions.

关键词： binary classification problem kernel density estimator machine learning robust estimator

来源：评论

学校读者我要写书评

暂无评论

Learning cellular automata rules for binary classification problem

引用

JOURNAL OF SUPERCOMPUTING 2013年第3期63卷 800-815页

作者： Piwonska, Anna Seredynski, Franciszek Szaban, Miroslaw Bialystok Tech Univ Fac Comp Sci Bialystok Poland Cardinal Stefan Wyszynski Univ Poland & Polish Japanese Inst Informat Technol Warsaw Poland Univ Nat Sci & Humanities Inst Comp Sci Siedlce Poland

This paper proposes a cellular automata-based solution of a binary classification problem. The proposed method is based on a two-dimensional, three-state cellular automaton (CA) with the von Neumann neighborhood. Since the number of possible CA rules (potential CA-based classifiers) is huge, searching efficient rules is conducted with use of a genetic algorithm (GA). Experiments show an excellent performance of discovered rules in solving the classification problem. The best found rules perform better than the heuristic CA rule designed by a human and also better than one of the most widely used statistical method: the k-nearest neighbors algorithm (k-NN). Experiments show that CAs rules can be successfully reused in the process of searching new rules.

关键词： Cellular automata binary classification problem Genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

Reducing the overfitting in the gROC curve estimation

引用

COMPUTATIONAL STATISTICS 2024年第2期39卷 1005-1022页

作者： Martinez-Camblor, Pablo Diaz-Coto, Susana Geisel Sch Med Dartmouth Dept Anesthesiol 7 Lebanon StSuite 309Hinman Box 7261 Lebanon NH 03751 USA Univ Autonoma Chile Fac Hlth Sci Providencia Chile Geisel Sch Med Dartmouth Dept Epidemiol Lebanon NH USA

The generalized receiver-operating characteristic, gROC, curve considers the classification ability of diagnostic tests when both larger and lower values of the marker are associated with higher probabilities of being positive. Its empirical estimation implies to select the best classification subsets among those satisfying particular condition. Both strong and weak consistency have already been proved. However, using the same data for both to select the classification subsets and to calculate its gROC curve leads to an over-optimistic estimate of the real performance of the diagnostic criteria on future samples. In this work, the bias of the empirical gROC curve estimator is explored through Monte Carlo simulations. Besides, two cross validation based algorithms are proposed for reducing the overfitting. The practical application of the proposed algorithms is illustrated through the analysis of a real world dataset. Simulation results suggest that the empirical gROC curve estimator returns optimistic approximations, especially, in situations in which the diagnostic capacity of the marker is poor and the sample size is small. The new proposed algorithms improve the estimation of the actual diagnostic test accuracy, and get almost unbiased gAUCs in most of the considered scenarios. However, the cross-validation based algorithms reported larger L-1-errors than the standard empirical estimators, and increment the computational cost of the procedures. As online supplementary material, this manuscript includes an R function which wraps up the implemented routines.

关键词： binary classification problem Cross-validation Diagnostic problem gROC curve Overfitting

来源：评论

学校读者我要写书评

暂无评论

About the use of the overlap coefficient in the binary classification context

引用

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS 2023年第19期52卷 6767-6777页

作者： Martinez-Camblor, Pablo Dartmouth Hitchcock Med Ctr Dept Anesthesiol 7 Lebanon StSuite 309Hinman Box 7261 Lebanon NH 03751 USA Geisel Sch Med Dept Biomed Data Sci Hanover NH USA

The overlap coefficient (OVL) measures the common area between two or more density functions. It has been used for measuring the similarity between distributions in different research fields including astronomy, economy or sociology, among others. Recently, different authors have studied the use of the OVL coefficient in the binary classification problem. They argue that, in particular settings, it could provide better accuracy measure than other stablished indices. We prove here that the OVL coefficient does not provide additional information to the Youden index and that, the potential advantages previously reported are based on the assumption that the classification rules underlying any classification process always assign more probability of being positive to the larger values of the marker. Particularly, we prove that, for a fixed continuous marker, the OVL coefficient is equivalent to the Youden index associated with the optimal classification rules based on this marker. We illustrate the problem studying the capacity of the white blood cells count to identify the type of disease in patients having either acute viral meningitis or acute bacterial meningitis.

关键词： Area under the ROC curve binary classification problem overlap coefficient ROC curve two-sample problem

来源：评论

学校读者我要写书评

暂无评论

Confidence intervals for the length of the receiver-operating characteristic curve based on a smooth estimator

引用

STATISTICAL METHODS IN MEDICAL RESEARCH 2023年第5期32卷 978-993页

作者： Martinez-Camblor, Pablo Geisel Sch Med Dartmouth Anesthesiol Dept Hanover NH USA Univ Autonoma Chile Fac Hlth Sci Providencia Chile Geisel Sch Med Dartmouth Anesthesiol Dept 7 Lebanon StSuite 309Hinman Box 7261 Lebanon NH 03751 USA

A good diagnostic test should show different behavior on both the positive and the negative populations. However, this is not enough for having a good classification system. The binary classification problem is a complex task, which implies to define decision criteria. The knowledge of the level of dissimilarity between the two involved distributions is not enough. We also have to know how to define those decision criteria. The length of the receiver-operating characteristic curve has been proposed as an index of the optimal discriminatory capacity of a biomarker. It is related not with the actual but with the optimal classification capacity of the considered diagnostic test. One particularity of this index is that its estimation should be based on parametric or smoothed models. We explore here the behavior of a kernel density estimator-based approximation for estimating the length of the receiver-operating characteristic curve. We prove the asymptotic distribution of the resulting statistic, propose a parametric bootstrap algorithm for confidence intervals construction, discuss the role that the bandwidth parameter plays in the quality of the provided estimations and, via Monte Carlo simulations, study its finite-sample behavior considering four different criteria for the bandwidth selection. The practical use of the length of the receiver-operating characteristic curve is illustrated through two real-world examples.

关键词： Asymptotic distribution binary classification problem kernel density estimator length of the curve receiver-operating characteristic curve

来源：评论

学校读者我要写书评

暂无评论

A deep learning-based method for pixel-level crack detection on concrete bridges

引用

IET IMAGE PROCESSING 2022年第10期16卷 2609-2622页

作者： Ji Kun Zhang Zhenhai Yu Jiale Dang Jianwu Lanzhou Jiaotong Univ Sch Automat & Elect Engn Lanzhou Gansu Peoples R China Gansu Artificial Intelligence & Graph Image Proc Lanzhou Gansu Peoples R China

Crack detection of the concrete bridge is an essential index for the safety assessment of bridge structure. It is more important to check the whole structure than to check the accuracy in the damage assessment. However, the traditional deep learning model method cannot completely detect the crack structure, which challenges image-based crack detection. For this reason, we propose deep bridge crack classification (DBCC)-Net as a classification-based deep learning network. By pruning the Yolox, the regression problem of the target detection is converted to the binary classification problem to avoid the network performance degradation caused by the translation invariance of the convolutional neural network (CNN). In addition, the network post-processing and a two-stage crack detection strategy are proposed to enable the network to detect cracks and extract crack morphology in high-resolution images quickly. In the first stage, DBCC-Net realizes the coarse extraction of crack position based on image slice classification. In the second stage, the complete crack morphology is extracted from the location suggested by the semantic segmentation network. Experimental results show that the proposed two-stage method has 19 frames per second (FPS) and 0.79 Miou (mean intersection over union) at the actual bridge images with 2560x2560 pixels. Although FPS is reduced, the Miou value is 7.8% higher than other methods, proving this paper's practical value.

关键词： high-resolution images convolutional neural network semantic segmentation network traditional deep learning model method regression analysis crack position network post-processing actual bridge images two-stage crack detection strategy deep learning-based method two-stage method convolutional neural nets image slice classification target detection bridges (structures) classification-based deep learning network regression problem damage assessment complete crack morphology essential index DBCC-Net feature extraction crack structure deep learning (artificial intelligence) image segmentation safety assessment image-based crack detection bridge structure binary classification problem cracks pixel-level crack detection deep bridge crack classification-Net concrete bridge object detection inference mechanisms network performance degradation crack detection image classification structural engineering computing image resolution

来源：评论

学校读者我要写书评

暂无评论

Towards robust validation strategies for EO flood maps

引用

REMOTE SENSING OF ENVIRONMENT 2024年 315卷

作者： Landwehr, Tim Dasgupta, Antara Waske, Bjorn Univ Osnabruck Inst Informat Wachsble 27 D-49090 Osnabruck Germany Rhein Westfal TH Aachen Inst Wasserbau & Wasserwirtsch Mies van der Rohe Str 17 D-52074 Aachen Germany

Flood maps based on Earth Observation (EO) data inform critical decision-making in almost every stage of the disaster management cycle, directly impacting the ability of affected individuals and governments to receive aid as well as informing policies on future adaptation. However, flood map validation also presents a challenge in the form of class imbalance between flood and non-flood classes, which has rarely been investigated. There are currently no established best practices for addressing this issue, and the accuracy of these maps is often viewed as a mere formality, which leads to a lack of user trust in flood map products and a limitation in their operational use and uptake. This paper provides the first comprehensive assessment of the impact of current EO-based flood map validation practices. Using flood inundation maps derived from Sentinel-1 synthetic aperture radar data with synthetically generated controlled errors and Copernicus Emergency Management Service flood maps as the ground truth, binary metrics were statistically evaluated for the quantification of flood detection accuracy for events under varying flood conditions. Especially, class specific metrics were found to be sensitive to the class imbalance, i.e. larger flood magnitudes result in higher metric scores, thus being naturally biased towards overpredicting classifiers. Metric stability across error percentiles and flood magnitudes was assessed through standard deviation calculated by bootstrapping to quantify the impact of sample selection subjectivity, where stratified sampling schemes exhibited the lowest standard deviation consistently. Thoughtful sample and response design were critical, with probability-based random sampling and proportional or equal class allocation vital to producing robust accuracy estimates comparable across study sites, error classes, and flood magnitudes. Results suggest that popular evaluation metrics such as the F1-Score are in fact unsuitable for accurate chara

关键词： Flood map validation Flood extent mapping Class imbalance Synthetic aperture radar binary classification problem Remote sensing

来源：评论

学校读者我要写书评

暂无评论

A decision-making tool for the determination of the distribution center location in a humanitarian logistics network

引用

EXPERT SYSTEMS WITH APPLICATIONS 2024年第PartC期238卷

作者： Taouktsis, Xenofon Zikopoulos, Christos Aristotle Univ Thessaloniki Sch Econ Dept Business Adm Thessaloniki Greece

The distribution of humanitarian aid is a vital issue for humanity's future. In recent years, the management of humanitarian crises has become more crucial than it was a decade ago. Due to the volatility and urgency that characterize such situations, one of the most important challenges globally is the optimization of decisions regarding the timely distribution of aid during humanitarian operations. Our main goal is to develop an innovative decision-making tool, essential for non-profit organizations and governments that aims at the prompt selection of the location of the distribution center of humanitarian aid, in cases of natural or human-made di-sasters. The proposed tool is based on network science principles and can be used for selecting a suitable node for the installation of a distribution center during the beginning of a humanitarian crisis, considering that networks have a volatile nature and require quick decisions. For the configuration of the proposed tool we use a combi-nation of a classical heuristic algorithm and predictive models based on a binary classification problem with the support of a supervised deep neural network. It is developed using the R programming language with the contribution of the "Shiny" package (web application framework for R) along with other packages for network analysis, data manipulation and visualization.

关键词： Humanitarian logistics Network science Centrality index Traveling salesman problem binary classification problem Deep neural network

来源：评论

学校读者我要写书评

暂无评论

AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria

引用

APPLIED INTELLIGENCE 2021年第7期51卷 4744-4761页

作者： Yan, Jianjian Zhang, Zhongnan Dong, Huailin Xiamen Univ Sch Informat Xiamen Peoples R China

As it is well known, decision tree is a kind of data-driven classification model, and its primary core is the split criterion. Although a great deal of split criteria have been proposed so far, almost all of them focus on the global class distribution of the training data. However, they ignored the local class imbalance problem that commonly appears during the decision tree induction over balanced or roughly balanced binary class data sets. In the present study, this problem is investigated in detail and an adaptive approach based on multiple existing split criteria is proposed. In the proposed scheme, the local class imbalanced ratio is considered as the weight factor to weigh the importance between these split criteria so as to determine the optimal splitting point at each internal node. In order to evaluate the effectiveness of the proposed method, it is applied on twenty roughly balanced real-world binary class data sets. Experimental results show that the proposed method not only outperforms all other methods, but also improves the prediction accuracy of each class.

关键词： Decision tree Local class imbalance Multiple split criteria binary classification problem

来源：评论

学校读者我要写书评

暂无评论

Kinship Verification in Childhood Images Using Vision Transformer

引用

Procedia Computer Science 2025年 258卷 3105-3114页

作者： Madhu Oruganti Toshanlal Meenpal Saikat Majumdar Hitesh Tekchandani Asst. Prof. Department of AI & DS Koneru Lakshmaiah Education Foundation Aziz Nagar Hyderabad 500075 TG India Associate Prof. Department of ECE National Institute of Technology GE Road Raipur 492010 CG India

Facial Kinship Verification involves determining whether two face images belong to relatives, a task that is particularly challenging due to subtle differences in facial features and large intra-class variations. In recent years, deep learning models have shown great promise in addressing this problem. In this work, we propose a Vision Transformer (ViT) model for facial Kinship Verification, leveraging the proven effectiveness of Transformer architectures in Natural Language Processing. The Vision Transformer is trained end-to-end on two benchmark datasets: the large-scale Families in the Wild (FIW) dataset, consisting of thousands of face images with corresponding kinship labels, and the smaller KinFaceW-II dataset. Our model employs multiple attention mechanisms to capture complex relationships between facial features and produce a final kinship prediction. Experimental results demonstrate that our approach outperforms state-of-the-art methods, achieving an average accuracy of 92% on the FIW dataset and an F1 score of 0.85. The Euclidean distance metric further enhances the classification of kin and non-kin pairs. These findings confirm the effectiveness of Vision Transformer models for facial Kinship Verification and underscore their potential for future research in this domain.

关键词： Facial Kinship Verification Vision Transformers Childhood Images binary classification problem Accuracy F1 Score

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：