检索结果-内蒙古大学图书馆

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Ma, Xiaoxiao Zhang, Yuchen Ding, Kaize Yang, Jian Wu, Jia Fan, Hao School of Computing Macquarie University Sydney Australia Amazon Machine Learning Sydney Australia School of Information Management Wuhan University Hubei China Department of Statistics and Data Science Northwestern University IL United States

ISBN: (纸本)9798891761643

Large language models (LLMs) have emerged as valuable tools for enhancing textual features in various text-related tasks. Despite their superiority in capturing the lexical semantics between tokens for text analysis, our preliminary study on two popular LLMs, i.e., GPT-3.5 and Llama2, shows that simply applying news embeddings from LLMs is ineffective for fake news detection. Such embeddings only encapsulate the language styles between tokens. Meanwhile, the high-level semantics among named entities and topics, which reveal the deviating patterns of fake news, have been ignored. Therefore, we propose a topic model together with a set of specially designed prompts to extract topics and real entities from LLMs and model the relations among news, entities, and topics as a heterogeneous graph to facilitate investigating news semantics. We then propose a Generalized Page-Rank model and a consistent learning criterion for mining the local and global semantics centered on each news piece through the adaptive propagation of features across the graph. Our model shows superior performance on five benchmark datasets over seven baseline methods and the efficacy of the key ingredients has been thoroughly validated. © 2024 Association for Computational Linguistics.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Hybrid Design for Privacy Preserved Image Representation in a Cloud Environment 12th

Hybrid Design for Privacy Preserved Image Representation in ...

引用

12th International Conference on Recent Trends in Computing, ICRTC 2024

作者： Vijay, K. Sorna Shanthi, D. Jaeyalakshmi, M. Vignesh, P. Yuvaraja, M. Department of Artificial Intelligence and Machine Learning Rajalakshmi Engineering College Chennai India Department of Artificial Intelligence and Data Science Rajalakshmi Engineering College Chennai India Department of Computer Science and Engineering Rajalakshmi Engineering College Chennai India

ISBN: (纸本)9789819789450

The digital era has made seamless sharing and keeping of media such as images on cloud platforms an integral part of our lives. Still, there is a big issue about user privacy and data security in these repositories. We have thus created a very strong system that is able to do image encryption and decryption so as to protect the privacy of users while at the same time making their pictures accessible. Our system uses hybrid encryption where we try to prevent even cloud service providers from accessing the original files hence ensuring that only the owner who has been authenticated through password based authentication can decrypt or access them. In this way, it allows maintaining high privacy standards without sacrificing user convenience. It starts with preparing photos for encryption so that they are kept safe during transmission and storage. When needed by a user, images are decrypted by our system and become available only to authorized users. Consequently, users can comfortably leave their private photo albums on cloud storage systems knowing too well that their friends cannot access them thereby increasing trust in using cloud services for storing sensitive information. The aim of this work is to increase user privacy and data security in cloud repositories through employing this encryption technique. The users are guaranteed that their private details are not exposed to others enabling them to use cloud storage efficiently without interfering with their privacy at all. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Cloud platforms

来源：评论

学校读者我要写书评

暂无评论

Hypothesis testing with e-values

arXiv

引用

arXiv 2024年

作者： Ramdas, Aaditya Wang, Ruodu Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University United States Department of Statistics and Actuarial Science University of Waterloo Canada

This book is written to offer a humble, but unified, treatment of e-values inhypothesis testing. The book is organized into three parts: FundamentalConcepts, Core Ideas, and Advanced Topics. The first part includes threechapters that introduce the basic concepts. The second part includes fivechapters of core ideas such as universal inference, log-optimality,e-processes, operations on e-values, and e-values in multiple testing. Thethird part contains five chapters of advanced topics. We hope that, by puttingthe materials together in this book, the concept of e-values becomes moreaccessible for educational, research, and practical use. Copyright © 2024, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A permutation-free kernel independence test

The Journal of Machine Learning Research

引用

The Journal of machine learning Research 2023年第1期24卷 17707-17774页

作者： Shubhanshu Shekhar Ilmun Kim Aaditya Ramdas Department of Statistics and Data Science Carnegie Mellon University Pittsburgh PA Department of Statistics and Data Science Department of Applied Statistics Yonsei University Seodaemun-gu Seoul Republic of Korea Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University Pittsburgh PA

In nonparametric independence testing, we observe i.i.d. data {(Xi, Yi)}ni=1, where X ∈ Χ, Y ∈ Y lie in any general spaces, and we wish to test the null that X is independent of Y. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Hence, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. In this paper, we provide a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced "cross" HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the permutation tests, our variants have the same power within a constant factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.

关键词： independence testing kernel-methods permutation-free tests Hilbert-Schmidt independence critrion (HSIC) distance covariance

来源：评论

学校读者我要写书评

暂无评论

Bagging in overparameterized learning: risk characterization and risk monotonization

The Journal of Machine Learning Research

引用

The Journal of machine learning Research 2023年第1期24卷 15081-15193页

作者： Pratik Patil Jin-Hong Du Arun Kumar Kuchibhotla Department of Statistics University of California Berkeley Berkeley CA Department of Statistics and Data Science & Machine Learning Department Carnegie Mellon University Pittsburgh PA Department of Statistics and Data Science Carnegie Mellon University Pittsburgh PA

Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.

关键词： subagging divide-and-conquer proportional asymptotics ridge regression

来源：评论

学校读者我要写书评

暂无评论

Empirical Bernstein in smooth Banach spaces

arXiv

引用

arXiv 2024年

作者： Martinez-Taboada, Diego Ramdas, Aaditya Department of Statistics & Data Science United States Machine Learning Department Carnegie Mellon University United States

Existing concentration bounds for bounded vector-valued random variables include extensions of the scalar Hoeffding and Bernstein inequalities. While the latter is typically tighter, it requires knowing a bound on the variance of the random variables. We derive a new vector-valued empirical Bernstein inequality, which makes use of an empirical estimator of the variance instead of the true variance. The bound holds in 2-smooth separable Banach spaces, which include finite dimensional Euclidean spaces and separable Hilbert spaces. The resulting confidence sets are instantiated for both the batch setting (where the sample size is fixed) and the sequential setting (where the sample size is a stopping time). The confidence set width asymptotically exactly matches that achieved by Bernstein in the leading term. The method and supermartingale proof technique combine several tools of Pinelis (1994) and Waudby-Smith and Ramdas (2024). Copyright © 2024, The Authors. All rights reserved.

关键词： Banach spaces

来源：评论

学校读者我要写书评

暂无评论

Statistical guarantees for local spectral clustering on random neighborhood graphs

引用

Journal of machine learning Research 2021年第1期22卷 1-71页

作者： Green, Alden Balakrishnan, Sivaraman Tibshirani, Ryan J. Department of Statistics and Data Science Carnegie Mellon University PittsburghPA15213 United States Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University PittsburghPA15213 United States

We study the Personalized PageRank (PPR) algorithm, a local spectral method for clustering, which extracts clusters using locally-biased random walks around a given seed node. In contrast to previous work, we adopt a classical statistical learning setup, where we obtain samples from an unknown nonparametric distribution, and aim to identify sufficiently salient clusters. We introduce a trio of population-level functionals—the normalized cut, conductance, and local spread, analogous to graph-based functionals of the same name—and prove that PPR, run on a neighborhood graph, recovers clusters with small population normalized cut and large conductance and local spread. We apply our general theory to establish that PPR identifies connected regions of high density (density clusters) that satisfy a set of natural geometric conditions. We also show a converse result, that PPR can fail to recover geometrically poorly-conditioned density clusters, even asymptotically. Finally, we provide empirical support for our theory. ©2021 Alden Green, Sivaraman Balakrishnan, and Ryan J. Tibshirani.

关键词： Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Enhancing Interpretability: The Role of Explainable AI in Healthcare Diagnostics 3

Enhancing Interpretability: The Role of Explainable AI in He...

引用

3rd International Conference on Electronics and Renewable Systems, ICEARS 2025

作者： Zade, Nikita Langote, Meher Verma, Prateek Faculty of Engineering & Technology Department of Artificial Intelligence & Data Science Maharashtra Sawangi442001 India Faculty of Engineering & Technology Department of Artificial Intelligence & Machine Learning Maharashtra Sawangi442001 India

ISBN: (纸本)9798331509675

XAI is now transforming the use of AI in diagnosing diseases by overcoming some of the problems inherent in most black-box approaches. In time-sensitive speciality areas like computer-aided diagnosis, image analysis, and predictive modelling, SHAP, LIME, and attention-based XAI techniques explain how AI decides to help doctors and other healthcare professionals trust AI. Therefore, this paper aims to clarify why XAI is needed in healthcare, introduce techniques, and show their examples in cancer diagnosis and disease prognosis. XAI improves diagnostic performance and satisfies regulatory and ethical requirements for increased usage in clinical practice. Because of XAI, clinicians can use the best of artificial intelligence to complement their knowledge in enhancing the treatment of patients. That is why the outlook of XAI in healthcare includes the next steps in achieving higher scalability, managing bias, and working with diverse regulatory frameworks, which will define the future of AI solutions in healthcare. © 2025 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Optimal ridge regularization for out-of-distribution prediction 24

Optimal ridge regularization for out-of-distribution predict...

引用

Proceedings of the 41st International Conference on machine learning

作者： Pratik Patil Jin-Hong Du Ryan J. Tibshirani Department of Statistics University of California Berkeley CA Department of Statistics and Data Science and Machine Learning Department Carnegie Mellon University Pittsburgh PA

We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models

An Empirical Analysis on Spatial Reasoning Capabilities of L...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Shiri, Fatemeh Guo, Xiao-Yu Far, Mona Golestan Yu, Xin Haffari, Gholamreza Li, Yuan-Fang Department of Data Science & AI Monash University Australia Australian Institute for Machine Learning University of Adelaide Australia School of Electrical Engineering and Computer Science University of Queensland Australia

ISBN: (纸本)9798891761643

Large Multimodal Models (LMMs) have achieved strong performance across a range of vision and language tasks. However, their spatial reasoning capabilities are under-investigated. In this paper, we construct a novel VQA dataset, Spatial-MM, to comprehensively study LMMs' spatial understanding and reasoning capabilities. Our analyses on object-relationship and multi-hop reasoning reveal several important findings. Firstly, bounding boxes and scene graphs, even synthetic ones, can significantly enhance LMMs' spatial reasoning. Secondly, LMMs struggle more with questions posed from the human perspective than the camera perspective about the image. Thirdly, chain of thought (CoT) prompting does not improve model performance on complex multi-hop questions involving spatial relations. Lastly, our perturbation analysis on GQA-spatial reveals that LMMs are much stronger at basic object detection than complex spatial reasoning. We believe our new benchmark dataset and in-depth analyses can spark further research on LMMs spatial reasoning. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：