检索结果-内蒙古大学图书馆

2022 IEEE International Conference on Multimedia and Expo, ICME 2022

作者： Yang, Haoxin Wang, Yi Li, Bin Shenzhen University Guangdong Key Lab. of Intelligent Info. Processing and Shenzhen Key Laboratory of Media Security Shenzhen China Dongguan University of Technology Dongguan China Shenzhen Institute of Articial Intelligence and Robotics for Society Shenzhen China

ISBN: (数字)9781665485630

ISBN: (纸本)9781665485630

Collab.rative learning is used in multi-media applications to distribute computing tasks and data storage over multiple sites. Recent studies found that private data info.mation can be derived from model updates between the server and clients. Yet, previous methods are limited by their capabilities of privacy inference in more general and practical situations. In this paper, we propose a novel property inference method in the deep feature space to overcome those limitations. In particular, our method can make inference decisions on the level of individual examples instead of a batch of examples. We can simultaneously perform multiple property inference attacks without the need of image reconstruction. The proposed method is evaluated on several image benchmark datasets, which demonstrates significant improvement of inference accuracy even in the presence of privacy protection schemes. © 2022 IEEE.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation 38

AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmenta...

引用

38th Conference on Neural info.mation processing Systems, NeurIPS 2024

作者： Han, Boyu Xu, Qianqian Yang, Zhiyong Bao, Shilong Wen, Peisong Jiang, Yangbangyan Huang, Qingming Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS China School of Computer Science and Tech. University of Chinese Academy of Sciences China Peng Cheng Laboratory China Key Laboratory of Big Data Mining and Knowledge Management CAS China

The Area Under the ROC Curve (AUC) is a well-known metric for evaluating instance-level long-tail learning problems. In the past two decades, many AUC optimization methods have been proposed to improve model performance under long-tail distributions. In this paper, we explore AUC optimization methods in the context of pixel-level long-tail semantic segmentation, a much more complicated scenario. This task introduces two major challenges for AUC optimization techniques. On one hand, AUC optimization in a pixel-level task involves complex coupling across loss terms, with structured inner-image and pairwise inter-image dependencies, complicating theoretical analysis. On the other hand, we find that mini-batch estimation of AUC loss in this case requires a larger batch size, resulting in an unaffordable space complexity. To address these issues, we develop a pixel-level AUC loss function and conduct a dependency-graph-based theoretical analysis of the algorithm's generalization ability. Additionally, we design a Tail-Classes Memory Bank (T-Memory Bank) to manage the significant memory demand. Finally, comprehensive experiments across various benchmarks confirm the effectiveness of our proposed AUCSeg method. The code is availab.e at https://***/boyuh/AUCSeg. © 2024 Neural info.mation processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Real time Monitoring Method of Welding Defects Based on LSTM Sequence Model 3

Real time Monitoring Method of Welding Defects Based on LSTM...

引用

3rd IEEE International Conference on info.mation technology, Big Data and Artificial Intelligence, ICIBA 2023

作者： Zhou, Xundao Chen, Mingmin Shang, Weipeng Shen, Hongping Xu, Huawei Fifth Institute of Electronics Ministry of Industry and Information Technology China Key Lab. of Min. of Technol. Indust. and Info. Technol. of Intelligent Prod. Qual. Eval. and Reliability Assur. China

ISBN: (纸本)9781665490788

During laser welding, abnormal conditions can result in workpiece defects and product failure. Therefore, a real-time monitoring system is required to detect laser welding processes and ensure product quality. In this study, we developed a real-time laser welding data acquisition system to collect plasma density, laser intensity, and molten pool temperature data during the welding process. Additionally, we established a neural network based on a combination of LSTM and CNN models to rapidly detect laser welding defects. The experimental results demonstrate that this method can effectively identify welding defects with an average accuracy rate of 96%. Furthermore, the LSTM layer can significantly improve the prediction performance of the CNN feature extraction model, particularly for forecasting and non-local features such as sequence offsets and delays. © 2023 IEEE.

关键词： Data acquisition

来源：评论

学校读者我要写书评

暂无评论

ETS: An Error Tolerable System for Coreference Resolution 15

ETS: An Error Tolerable System for Coreference Resolution

引用

15th Conference on Computational Natural Language Learning, CoNLL 2011

作者： Xiong, Hao Song, Linfeng Meng, Fandong Liu, Yang Liu, Qun Lü, Yajuan Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704 Beijing100190 China

ISBN: (纸本)9781937284084

This paper presents our error tolerable system for coreference resolution in CoNLL-2011(Pradhan et al., 2011) shared task (closed track). Different from most previous reported work, we detect mention candidates based on packed forest instead of single parse tree, and we use beam search algorithm based on the Bell Tree to create entities. Experimental results show that our methods achieve promising results on the development set. © 2011 Association for Computational Linguistics

关键词： Forestry

来源：评论

学校读者我要写书评

暂无评论

AUCSeg: AUC-oriented pixel-level long-tail semantic segmentation 24

AUCSeg: AUC-oriented pixel-level long-tail semantic segmenta...

引用

Proceedings of the 38th International Conference on Neural info.mation processing Systems

作者： Boyu Han Qianqian Xu Zhiyong Yang Shilong Bao Peisong Wen Yangbangyan Jiang Qingming Huang Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and School of Computer Science and Tech. University of Chinese Academy of Sciences Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Peng Cheng Laboratory School of Computer Science and Tech. University of Chinese Academy of Sciences School of Computer Science and Tech. University of Chinese Academy of Sciences and Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Key Laboratory of Big Data Mining and Knowledge Management CAS

ISBN: (纸本)9798331314385

关键词：

来源：评论

学校读者我要写书评

暂无评论

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

arXiv

引用

arXiv 2024年

作者： Li, Yian Tian, Wentao Jiao, Yang Chen, Jingjing Zhao, Na Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Singapore University of Technology and Design Singapore

Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a Multimodal Assumptive Reasoning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method. Copyright © 2024, The Authors. All rights reserved.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Not all diffusion model activations have been evaluated as discriminative features 24

Not all diffusion model activations have been evaluated as d...

引用

Proceedings of the 38th International Conference on Neural info.mation processing Systems

作者： Benyuan Meng Qianqian Xu Zitai Wang Xiaochun Cao Qingming Huang Institute of Information Engineering CAS and School of Cyber Security University of Chinese Academy of Sciences Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Peng Cheng Laboratory Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS School of Cyber Science and Tech. Shenzhen Campus of Sun Yat-sen University School of Computer Science and Tech. University of Chinese Academy of Sciences and Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Key Laboratory of Big Data Mining and Knowledge Management CAS

ISBN: (纸本)9798331314385

Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks such as semantic segmentation. Given numerous activations, selecting a small yet effective subset poses a fundamental problem. To this end, the early study of this field performs a large-scale quantitative comparison of the discriminative ability of the activations. However, we find that many potential activations have not been evaluated, such as the queries and keys used to compute attention scores. Moreover, recent advancements in diffusion architectures bring many new activations, such as those within embedded ViT modules. Both combined, activation selection remains unresolved but overlooked. To tackle this issue, this paper takes a further step with a much broader range of activations evaluated. Considering the significant increase in activations, a full-scale quantitative comparison is no longer operational. Instead, we seek to understand the properties of these activations, such that the activations that are clearly inferior can be filtered out in advance via simple qualitative evaluation. After careful analysis, we discover three properties universal among diffusion models, enabling this study to go beyond specific models. On top of this, we present effective feature selection solutions for several popular diffusion models. Finally, the experiments across multiple discriminative tasks validate the superiority of our method over the SOTA competitors. Our code is availab.e at https://***/Darkbblue/generic-diffusion-feature.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

arXiv

引用

arXiv 2024年

作者： Jiao, Pengkun Zhao, Na Chen, Jingjing Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Singapore University of Technology and Design Singapore

Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene. While language and vision foundation models have achieved success in handling various open-vocabulary tasks with abundant training data, OV-3DDet faces a significant challenge due to the limited availab.lity of training data. Although some pioneering efforts have integrated vision-language models (VLM) knowledge into OV-3DDet learning, the full potential of these foundational models has yet to be fully exploited. In this paper, we unlock the textual and visual wisdom to tackle the open-vocabulary 3D detection task by leveraging the language and vision foundation models. We leverage a vision foundation model to provide image-wise guidance for discovering novel classes in 3D scenes. Specifically, we utilize a object detection vision foundation model to enable the zero-shot discovery of objects in images, which serves as the initial seeds and filtering guidance to identify novel 3D objects. Additionally, to align the 3D space with the powerful vision-language space, we introduce a hierarchical alignment approach, where the 3D feature space is aligned with the vision-language feature space using a pre-trained VLM at the instance, category, and scene levels. Through extensive experimentation, we demonstrate significant improvements in accuracy and generalization, highlighting the potential of foundation models in advancing open-vocabulary 3D object detection in real-world scenarios. © 2024, CC BY.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Suppress content shift: better diffusion features via off-the-shelf generation techniques 24

Suppress content shift: better diffusion features via off-th...

引用

Proceedings of the 38th International Conference on Neural info.mation processing Systems

作者： Benyuan Meng Qianqian Xu Zitai Wang Zhiyong Yang Xiaochun Cao Qingming Huang Institute of Information Engineering CAS and School of Cyber Security University of Chinese Academy of Sciences Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Peng Cheng Laboratory Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS School of Computer Science and Tech. University of Chinese Academy of Sciences School of Cyber Science and Tech. Shenzhen Campus of Sun Yat-sen University School of Computer Science and Tech. University of Chinese Academy of Sciences and Key Lab. of Intelligent Information Processing Institute of Computing Technology CAS and Key Laboratory of Big Data Mining and Knowledge Management CAS

ISBN: (纸本)9798331314385

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature. Further empirical study also indicates that its negative impact is not negligible even when content shift is not visually perceivable. Hence, we propose to suppress content shift to enhance the overall quality of diffusion features. Specifically, content shift is related to the info.mation drift during the process of recovering an image from the noisy input, pointing out the possibility of turning off-the-shelf generation techniques into tools for content shift suppression. We further propose a practical guideline named GATE to efficiently evaluate the potential benefit of a technique and provide an implementation of our methodology. Despite the simplicity, the proposed approach has achieved superior results on various tasks and datasets, validating its potential as a generic booster for diffusion features. Our code is availab.e at https://***/Darkbblue/diffusion-content-shift.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Prototypical Residual Networks for Anomaly Detection and Localization

Prototypical Residual Networks for Anomaly Detection and Loc...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Hui Zhang Zuxuan Wu Zheng Wang Zhineng Chen Yu-Gang Jiang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University Shanghai Collaborative Innovation Center of Intelligent Visual Computing School of Computer Science Zhejiang University of Technology

Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies with a handful of abnormal samples, producing unsatisfactory performance. On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. PRN mainly consists of two parts: multi-scale prototypes that explicitly represent the residual features of anomalies to normal patterns; a multisize self-attention mechanism that enables variable-sized anomalous feature learning. Besides, we present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies. Extensive experiments on the challenging and widely used MVTec AD benchmark show that PRN outperforms current state-of-the-art unsupervised and supervised methods. We further report SOTA results on three additional datasets to demonstrate the effectiveness and generalizability of PRN.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：