检索结果-内蒙古大学图书馆

23rd International Semantic web Conference, ISWC 2024

作者： Lin, Kate Alrashed, Tarfah Noy, Natasha Google Research Google San Francisco United States

ISBN: (纸本)9783031778438

The web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities;rather, they are intricately connected through complex relationships. Semantic relationships between datasets provide critical insights for research and decision-making processes. In this paper, we study dataset relationships from the perspective of users who discover, use, and share datasets on the web: what relationships are important for different tasks? What contextual information might users want to know? We first present a comprehensive taxonomy of relationships between datasets on the web and map these relationships to user tasks performed during dataset discovery. We develop a series of methods to identify these relationships and compare their performance on a large corpus of datasets generated from web pages with *** markup. We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%. Finally, we highlight gaps in available semantic markup for datasets and discuss how incorporating comprehensive semantics can facilitate the identification of dataset relationships. By providing a comprehensive overview of dataset relationships at scale, this paper sets a benchmark for future research. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： dataset relationships semantic markup web datasets

来源：评论

学校读者我要写书评

暂无评论

Exploiting web Images for Fine-Grained Visual Recognition via Dynamic Loss Correction and Global Sample Selection

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2022年 24卷 1105-1115页

作者： Liu, Huafeng Zhang, Haofeng Lu, Jianfeng Tang, Zhenmin Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing Peoples R China

To distinguish subtle differences among fine-grained categories, a large amount of well-labeled images are typically required. However, acquiring manual annotations for fine-grained categories is an extremely difficult task as it usually has a high demand for professional knowledge. To this end, directly leveraging web images for learning fine-grained models becomes a natural choice. Nevertheless, due to the existence of label noise, this learning paradigm tends to have a poor performance. In this work, we propose an end-to-end approach by combining dynamic loss correction and global sample selection to alleviate the problem of label noise. Specifically, we leverage the network to predict all samples, record the predictions of recent several epochs, and calculate the uncertainly-based dynamic loss for global sample selection. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed approach. The source code of our approach has been released on the website: https://***/NUST-Machine-Intelligence-Laboratory/dlc.

关键词： Noise measurement Training Training data Uncertainty History Feature extraction Visualization Fine-grained recognition web datasets global sample selection uncertainly-based dynamic loss correction

来源：评论

学校读者我要写书评

暂无评论

Making AdaBoost Less Prone to Overfitting on Noisy datasets 6

Making AdaBoost Less Prone to Overfitting on Noisy Datasets

引用

6th International Conference on web Research (ICWR)

作者： Modarres, Zainab Ghadiri Shabankhah, Mahmood Kamandi, Ali Univ Tehran Coll Engn Sch Engn Sci Tehran Iran

ISBN: (纸本)9781728110516

AdaBoost is perhaps one of the most well-known ensemble learning algorithms. In simple terms, the idea in AdaBoost is to train a number of weak learners in an increamental fashion where each new learner tries to focus more on those samples that were misclassfied by the preceding classifiers. Consequently, in the presence of noisy data samples, the new leraners will somehow memorize the data, which in turn will lead to an overfitted model. The main objective of this paper is to provide a generalized version of the AdaBoost algorithm that avoids overfitting, and performs better when the data samples are corrupted with noise. To this end, we make use of another ensemble learning algorithm called ValidBoost [15], and introduce a mechanism to dynamically determine the thresholds for both the error rate of each classifier and the error rate in each iteration. These threshholds enable us to control the error rate of the algorithm. Experimental simulations have been made on several benchmark datasets including web datasets such as "website Phishing Data Set" and "Page Blocks Classification Data Set" to evaluate the performance of our proposed algorithm.

关键词： Ensemble Learning Algorithms Boosting Adaboost Overfitting Noise web datasets Zero_One_Loss

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：