检索结果-内蒙古大学图书馆

Assessing data and sample complexity in unmanned aerial vehicle imagery for agricultural pattern classification

SMART AGRICULTURAL TECHNOLOGY 2025年 10卷

作者： Arslanova, Linara Hese, Soeren Foelsch, Marcel Scheibler, Friedemann Schmullius, Christiane Friedrich Schiller Univ Jena Inst Geog Earth Observat D-07743 Jena Thuringia Germany CLAAS 365FarmNet GmbH Precis Farming Serv D-06188 Landsberg Saxony Anhalt Germany CLAAS E Syst GmbH Precis Farming Serv D-06188 Landsberg Saxony Anhalt Germany

This article assesses the use of high-resolution Unmanned Aerial Vehicle (UAV) data from commercial field sensors for classifying small-scale agricultural patterns in four crop types (Winter Wheat, Spring Barley, Rapeseed, and Corn) acquired at ground sample distances (GSDs) of 0.027 m, 0.053 m and 0.064 m. Image harmonization challenges due to spectral and textural variations from varying GSDs and sensors are addressed. The study investigates the data and sample complexity required to develop an effective machine/deep learning (ML/DL) model, using techniques such as the Jeffries-Matusita Distance for assessment of class separability and feature importance ranking for feature and layer selection, semivariogram analysis for determining minimum sample patch sizes. The results demonstrate distinct classification capabilities based on spectral information in differentiating between sub-classes such as weed infestation, bare soil, disturbed canopy areas, and undisturbed canopy areas. However, there are limitations in detecting refined sub-classes of undisturbed canopy areas assigned to phenological groups, highlighting the need for class reduction and tailored feature and layer selection. The final set of sub-classes was proposed. The study also proposes a customized set of input layers for each crop type and identifies minimum patch sizes to enhance the efficiency of detecting specific agricultural patterns. It has been confirmed that to exploit texture information for classification (at smaller sample patch sizes < 120 pixels), Ground Sampling Distances (GSDs) between 0.027 m and 0.064 m (for RGB and CIR sensors of commercial drones, respectively) are suitable for capturing detailed patterns of Corn and Spring Barley. However, the CIR sensor, at GSDs of 0.053 m and 0.064 m, performs better for Winter Wheat and Rapeseed.

关键词： UAV Crop monitoring Deep learning Machine learning data complexity Sample complexity

来源：评论

学校读者我要写书评

暂无评论

data complexity-based batch sanitization method against poison in distributed learning

引用

Digital Communications and Networks 2024年第2期10卷 416-428页

作者： Silv Wang Kai Fan Kuan Zhang Hui Li Yintang Yang State Key Laboratory of Integrated Service Networks Xidian UniversityXi'an710126China Department of Electrical and Computer Engineering University of Nebraska-LincolnLincolnNE68588USA Key Lab.of the Minist.of Educ.for Wide Bandgap Semiconductor Materials and Devices Xidian UniversityXi'an710071China

The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate *** the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model *** design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection *** model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new *** neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.

关键词： Distributed machine learning security Federated learning data poisoning attacks data sanitization Batch detection data complexity

来源：评论

学校读者我要写书评

暂无评论

data complexity of query answering in expressive Description Logics via tableaux

引用

JOURNAL OF AUTOMATED REASONING 2008年第1期41卷 61-98页

作者： Ortiz, Magdalena Calvanese, Diego Eiter, Thomas Vienna Univ Technol Inst Informat Syst A-1040 Vienna Austria Free Univ Bozen Bolzano Fac Comp Sci Bolzano Italy

The logical foundations of the standard web ontology languages are provided by expressive Description Logics (DLs), such as SHIQ and SHOIQ. In the Semantic Web and other domains, ontologies are increasingly seen also as a mechanism to access and query data repositories. This novel context poses an original combination of challenges that has not been addressed before: (i) sufficient expressive power of the DL to capture common data modelling constructs;(ii) well established and flexible query mechanisms such as those inspired by database technology;(iii) optimisation of inference techniques with respect to data size, which typically dominates the size of ontologies. This calls for investigating data complexity of query answering in expressive DLs. While the complexity of DLs has been studied extensively, few tight characterisations of data complexity were available, and the problem was still open for most DLs of the SH family and for standard query languages like conjunctive queries and their extensions. We tackle this issue and prove a tight coNP upper bound for positive existential queries without transitive roles in SHOQ, SHIQ, and SHOI. We thus establish that, for a whole range of sublogics of SHOIQ that contain AL, answering such queries has coNP-complete data complexity. We obtain our result by a novel tableaux-based algorithm for checking query entailment, which uses a modified blocking condition in the style of CARIN. The algorithm is sound for SHOIQ, and shown to be complete for all considered proper sublogics in the SH family.

关键词： data complexity query answering expressive description logics

来源：评论

学校读者我要写书评

暂无评论

data complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

引用

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY 2024年第6期33卷 1-45页

作者： Wan, Xiaohui Zheng, Zheng Qin, Fangyun Lu, Xuhui Beihang Univ Sch Automat Sci & Elect Engn 37 Xueyuan Rd Beijing 100191 Peoples R China Capital Normal Univ Coll Informat Engn 105 West Third Ring Rd North Beijing 100048 Peoples R China

Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this article, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution;(2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, and instance-level overlap;(3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets;(4) integrating data complexity information into the learning process can enhance an algorithm's learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.

关键词： Defect prediction machine learning data complexity instance hardness

来源：评论

学校读者我要写书评

暂无评论

data complexity measured by principal graphs

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2013年第10期65卷 1471-1482页

作者： Zinovyev, Andrei Mirkes, Evgeny Inst Curie F-75005 Paris France INSERM U900 Paris France Mines ParisTech Fontainebleau France Univ Leicester Leicester LE1 7RH Leics England Krasnoyarsk Inst Railway Engn Krasnoyarsk 660028 28 Russia

How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalize the notion of principal manifolds for datasets with non-trivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: (1) geometric (deviation of the data's approximator from some "idealized" configuration, such as deviation from harmonicity);(2) structural (how many elements of a principal graph are needed to approximate the data), and (3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the "accuracy complexity" plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： data analysis Approximation algorithms data structures data complexity

来源：评论

学校读者我要写书评

暂无评论

data complexity assessment in undersampled classification of high-dimensional biomedical data

引用

PATTERN RECOGNITION LETTERS 2006年第12期27卷 1383-1389页

作者： Baumgartner, R Somorjai, RL Natl Res Council Canada Inst Biodiagnost Winnipeg MB R3B 1Y6 Canada

Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem. (c) 2006 Elsevier B.V. All rights reserved.

关键词： classification data complexity regularization undersampled biomedical problems

来源：评论

学校读者我要写书评

暂无评论

data complexity and classification accuracy correlation in oversampling algorithms 4

Data complexity and classification accuracy correlation in o...

引用

4th International Workshop on Learning with Imbalanced Domains - Theory and Applications (LIDTA)

作者： Komorniczak, Joanna Ksieniewicz, Pawel Wozniak, Michal Wroclaw Univ Sci & Technol Dept Syst & Comp Networks Wroclaw Poland

Purpose: This work proposes the hypothesis that data oversampling may lead to dataset simplification according to selected data difficulty metrics and that such simplification positively affects the quality of selected classifier learning methods. Methods: A set of computer experiments was performed for 47 benchmark datasets to make the hypothesis plausible. The experiments considered five oversampling methods, five classifiers, and 22 metrics for data difficulty assessment. The experiments aim to establish: (a) whether there is a relationship between resampling and change in the difficulty of the training data and (b) whether there is a relationship between changes in the values of training set difficulty metrics and classification quality. Results: Based on the obtained results, the research hypothesis was confirmed. It was indicated which measures correlate with selected classifiers. The experiments showed the relationship between the change of assessed difficulty measures after oversampling and the classification quality of selected models. Conclusion: The obtained results allow using the selected measures to predict whether a given oversampling method leads to favorable modifications of the learning set for a given type of classifier. Showed relationship between difficulty measures and classification will allow using the mentioned measures as a learning criterion. For example, guided oversampling can treat the modification of the learning set as an optimization task. During the oversampling process, no estimation of classification quality metrics will be required, but only an evaluation of the training set difficulty. This may contribute to the proposition of computationally efficient methods.

关键词： oversampling data complexity imbalanced data pattern classification

来源：评论

学校读者我要写书评

暂无评论

data complexity measures in feature selection

Data complexity measures in feature selection

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Okimoto, Lucas C. Lorena, Ana C. Univ Fed Sao Paulo UNIFESP Inst Ciencia & Tecnol ICT Sao Jose Dos Campos SP Brazil Inst Tecnol Aeronaut ITA Div Ciencia Comp IEC Sao Jose Dos Campos SP Brazil

ISBN: (纸本)9781728119854

Feature selection (FS) is a pre-processing step often mandatory in data analysis by Machine Learning techniques. Its objective is to reduce data dimensionality by identifying and maintaining only the relevant features from a dataset. In this work we evaluate the use of complexity measures of classification problems in FS. These descriptors allow estimating the intrinsic difficulty of a classification problem by regarding on characteristics of the dataset available for learning. We propose a combined univariate-multivariate FS technique which employs two complexity measures: Fisher's maximum discriminant ratio and sum of intra-extra class distances. The results reveal that the complexity measures are indeed suitable for estimating feature importance in classification datasets. Large reductions in the numbers of features were obtained, while preserving, in general, the predictive accuracy of two strong classification techniques: Support Vector Machines and Random Forests.

关键词： Machine Learning feature selection data complexity

来源：评论

学校读者我要写书评

暂无评论

Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method

引用

FUZZY SETS AND SYSTEMS 2010年第1期161卷 3-19页

作者： Luengo, Julian Herrera, Francisco Univ Granada Dept Comp Sci & Artificial Intelligence E-18071 Granada Spain

The analysis of data complexity is a proper framework to characterize the tackled classification problem and to identify domains of competence of classifiers. As a practical outcome of this framework, the proposed data complexity measures may facilitate the choice of a classifier for a given problem. The aim of this paper is to study the behaviour of a fuzzy rule based classification system and its relationship to data complexity. We use as a case of study the fuzzy hybrid genetic based machine learning method presented in [H. Ishibuchi, T. Yamamoto, T. Nakashima, Hybridization of fuzzy GBML approaches for pattern classification problems, IEEE Transactions on Systems, Man, and Cybemetics-Part B: Cybernetics 35 (2) (2005) 359-365]. We examine several metrics of data complexity over a wide range of data sets built from real data and try to extract behaviour patterns from the results. We obtain rules which describe both good or bad behaviours of the fuzzy rule based classification system. These rules use values of data complexity metrics in their antecedents, so we try to predict the behaviour of the method from the data set complexity metrics prior to its application. Therefore, we can establish the domains of competence of this fuzzy rule based classification system. (C) 2009 Elsevier B.V. All rights reserved.

关键词： Classification data complexity Fuzzy rule based systems Genetic fuzzy systems

来源：评论

学校读者我要写书评

暂无评论

The data complexity index to construct an efficient cross-validation method

引用

DECISION SUPPORT SYSTEMS 2010年第1期50卷 93-102页

作者： Li, Der-Chiang Fang, Yao-Hwei Fang, Y. M. Frank Natl Cheng Kung Univ Dept Ind & Informat Management Tainan 70101 Taiwan Natl Hlth Res Inst Div Biostat & Bioinformat Miaoli Taiwan Feng Chia Univ Geog Informat Syst Res Ctr Dept Civil & Hydraul Engn Taichung Taiwan

Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size and the number of experiment runs, to implement a validated evaluation. This study develops an efficient cross-validation method called complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index, called the CBE index, by exploring the geometric structure and noise of data. The CBE index is used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with computationally expensive classification data sets. A simulated and three real data sets are employed to validate the performance of the proposed method in the study, while the validation methods compared are repeated random sub-sampling validation and K-fold cross-validation. The results show that CBE cross-validation, repeated random sub-sampling validation and K-fold cross-validation have similar validation performance, except that the training time required for CBE cross-validation is indeed lower than that for the other two methods. (C) 2010 Elsevier B.V. All rights reserved.

关键词： Binary classification problem Cross-validation data complexity

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：