检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Jensen, Simon B. Oehmcke, Stefan Møgelmose, Andreas Madadi, Meysam Igel, Christian Escalera, Sergio Moeslund, Thomas B. Visual Analysis and Perception Laboratory Aalborg University Denmark Pioneer Centre for Artificial Intelligence Denmark Department of Computer Science Copenhagen University Denmark Institute for Visual & Analytic Computing Rostock University Germany University of Barcelona and Computer Vision Center Spain

Assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This study investigates whether deep learning-based fusion of close-range sensing data from 2D orthophotos and 3D airborne laser scanning (ALS) point clouds can reliable assess the biodiversity potential of forests. We introduce the BioVista dataset, comprising 44 378 paired samples of orthophotos and ALS point clouds from temperate forests in Denmark, designed to explore multimodal fusion approaches. Using deep neural networks (ResNet for orthophotos and PointVector for ALS point clouds), we investigate each data modality’s ability to assess forest biodiversity potential, achieving overall accuracies of 76.7% and 75.8%, respectively. We explore various 2D and 3D fusion approaches: confidence-based ensembling, feature-level concatenation, and end-to-end training, achieving overall accuracies of 80.5%, 81.4% and 80.4% respectively. Our results demonstrate that spectral information from orthophotos and structural information from ALS point clouds effectively complement each other in forest biodiversity assessment. © 2025, CC BY.

关键词： Laser applications

来源：评论

学校读者我要写书评

暂无评论

Trace-based Multi-Dimensional Root Cause Localization of Performance Issues in Microservice Systems

Trace-based Multi-Dimensional Root Cause Localization of Per...

引用

International Conference on Software Engineering (ICSE)

作者： Chenxi Zhang Zhen Dong Xin Peng Bicheng Zhang Miao Chen Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China School of Computer Science and Shanghai Key Laboratory of Data Science Fudan University China

ISBN: (数字)9798400702174

ISBN: (纸本)9798350382143

Modern microservice systems have become increasingly complicated due to the dynamic and complex interactions and runtime environment. It leads to the system vulnerable to performance issues caused by a variety of reasons, such as the runtime environments, communications, coordinations, or implementations of services. Traces record the detailed execution process of a request through the system and have been widely used in performance issues diagnosis in microservice systems. By identifying the execution processes and attribute value combinations that are common in anomalous traces but rare in normal traces, engineers may localize the root cause of a performance issue into a smaller scope. However, due to the complex structure of traces and the large number of attribute combinations, it is challenging to find the root cause from the huge search space. In this paper, we propose TraceContrast, a trace-based multidimensional root cause localization approach. TraceContrast uses a sequence representation to describe the complex structure of a trace with attributes of each span. Based on the representation, it combines contrast sequential pattern mining and spectrum analysis to localize multidimensional root causes efficiently. Experimental studies on a widely used microservice benchmark show that TraceContrast outperforms existing approaches in both multidimensional and instance-dimensional root cause localization with significant accuracy advantages. Moreover, Trace-Contrast is efficient and its efficiency can be further improved by parallel execution.

关键词： Location awareness Runtime environment Accuracy Microservice architectures Benchmark testing Data mining Spectral analysis

来源：评论

学校读者我要写书评

暂无评论

Data-Free Network Debiasing for Long-Tailed visual Recognition

Data-Free Network Debiasing for Long-Tailed Visual Recogniti...

引用

2022 IEEE International Conference on Multimedia and Expo, ICME 2022

作者： Cai, Jinmian Wang, Zheng Fu, Huazhu Chen, Jingjing Jiang, Yu-Gang School of Computer Science Fudan University Shanghai Key Lab of Intelligent Information Processing China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Biren Technology IHPC A-STAR

ISBN: (数字)9781665485630

ISBN: (纸本)9781665485630

Real-world data is often unbalanced and exhibits long-tailed distribution over classes. Vanilla classification models trained on imbalanced datasets inherently exhibit bias towards dominant classes. Existing debiasing methods mostly balance the data or the loss during training. Nevertheless, these data-acquiring methods are not suitable for situations where training data are unavailable. In this paper, we appeal to solutions without access to training data and propose a datafree debiasing (Free-D) method that serves as a plug-and-play module for any standard classification model. Specifically, our method adjusts both the feature representation via feature representation shifting and the classifier weight via class prior compensation in a data-free manner. We evaluate and compare our methods on four long-tailed visual recognition datasets, i.e., long-tailed CIFAR-10/-100, ImageNet-LT, and Places-LT. Extensive experiments demonstrate that the proposed data-free method achieves comparable results of other data-acquired methods. © 2022 IEEE.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Retrieval Augmented Recipe Generation

Retrieval Augmented Recipe Generation

引用

IEEE Workshop on Applications of computer Vision (WACV)

作者： Guoshan Liu Hailong Yin Bin Zhu Jingjing Chen Chong-Wah Ngo Yu-Gang Jiang Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing Singapore Management University

ISBN: (数字)9798331510831

ISBN: (纸本)9798331510848

The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu-cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset.

关键词： Training computer vision Accuracy Computational modeling Stochastic processes Predictive models Reliability Faces

来源：评论

学校读者我要写书评

暂无评论

Causal-IQA: Towards the Generalization of Image Quality Assessment Based on Causal Inference 41

Causal-IQA: Towards the Generalization of Image Quality Asse...

引用

41st International Conference on Machine Learning, ICML 2024

作者： Zhong, Yan Wu, Xingyu Zhang, Li Yang, Chenxi Jiang, Tingting School of Mathematical Sciences Peking University Beijing China National Engineering Research Center of Visual Technology National Key Laboratory for Multimedia Information Processing School of Computer Science Peking University Beijing China Department of Computing The Hong Kong Polytechnic University Hong Kong Hefei Institute of Physical Science Chinese Academy of Sciences University of Science and Technology of China Hefei China National Biomedical Imaging Center Peking University Beijing China

Due to the high cost of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent deep learning-based IQA *** address this, this paper proposes a novel end-to-end blind IQA method: ***, we first analyze the causal mechanisms in IQA tasks and construct a causal graph to understand the interplay and confounding effects between distortion types, image contents, and subjective human ***, through shifting the focus from correlations to causality, Causal-IQA aims to improve the estimation accuracy of image quality scores by mitigating the confounding effects using a causality-based optimization *** optimization strategy is implemented on the sample subsets constructed by a Counterfactual Division process based on the Backdoor *** experiments illustrate the superiority of Causal-IQA. Copyright 2024 by the author(s)

关键词： Image correlation

来源：评论

学校读者我要写书评

暂无评论

Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation 39

Semi-Supervised Clustering Framework for Fine-grained Scene ...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Yang, Jiarui Wang, Chuan Zhang, Jun Wu, Shuyi Jinjing, Zhao Liu, Zeming Yang, Liang Shanghai Key Lab of Intell. Info. Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Institute of Information Engineering CAS China School of Computer Science and Technology Beijing JiaoTong University China Guangdong Provincial Key Lab of Intell. Info. Processing & Shenzhen Key Lab of Media Security Shenzhen University China Information Research Center of Military Science PLA Academy of Military Science China National Key Laboratory of Science and Technology on Information System Security China School of Computer Science and Engineering Beihang University China School of Artificial Intelligence Hebei University of Technology China

ISBN: (纸本)157735897X

Scene Graph Generation (SGG) aims to detect all objects and identify their pairwise relationships existing in the scene. Considering the substantial human labor costs, existing scene graph annotations are often sparse and biased, which result in confusion training with low-frequency predicates. In this work, we design a Semi-Supervised Clustering framework for Scene Graph Generation (SSC-SGG) that uses the sparse labeled data to guide the generation of effective pseudo-labels from unlabeled object pairs, thus enriching the labeled sample space, especially for low-frequency interaction samples. We approach from the perspective of clustering, reducing the problem of confirmation bias in a self-training manner. Specifically, we first enhance the model’s robustness to feature extraction via prototype-based clustering, aggregating different relationship augmented features onto the same prototype. Secondly, we design a dynamic pseudo-label assignment algorithm based on a mini-batch, which adjusts the detection sensitivity to different frequency samples from the historical assignment. Finally, we conduct joint training on the pseudo-labels and the labeled data. We conduct experiments on various SGG models and achieve substantial overall performance improvements, demonstrating the effectiveness of SSC-SGG. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

arXiv

引用

arXiv 2024年

作者： Yang, Haibo Chen, Yang Pan, Yingwei Yao, Ting Chen, Zhineng Wu, Zuxuan Jiang, Yu-Gang Mei, Tao School of Computer Science Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China HiDream.ai Inc. China

Learning radiance fields (NeRF) with powerful 2D diffusion models has garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D representations of NeRF lack explicit modeling of meshes and textures over surfaces, and such surface-undefined way may suffer from the issues, e.g., noisy surfaces with ambiguous texture details or cross-view inconsistency. To alleviate this, we present DreamMesh, a novel text-to-3D architecture that pivots on well-defined surfaces (triangle meshes) to generate high-fidelity explicit 3D model. Technically, DreamMesh capitalizes on a distinctive coarse-to-fine scheme. In the coarse stage, the mesh is first deformed by text-guided Jacobians and then DreamMesh textures the mesh with an interlaced use of 2D diffusion models in a tuning free manner from multiple viewpoints. In the fine stage, DreamMesh jointly manipulates the mesh and refines the texture map, leading to high-quality triangle meshes with high-fidelity textured materials. Extensive experiments demonstrate that DreamMesh significantly outperforms state-of-the-art text-to-3D methods in faithfully generating 3D content with richer textual details and enhanced geometry. Our project page is available at https://***. Copyright © 2024, The Authors. All rights reserved.

关键词： Mesh generation

来源：评论

学校读者我要写书评

暂无评论

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

arXiv

引用

arXiv 2025年

作者： Han, Feng Chen, Kai Gong, Chao Wei, Zhipeng Chen, Jingjing Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China

The exceptional generative capability of text-to-image models has raised substantial safety concerns regarding the generation of Not-Safe-For-Work (NSFW) content and potential copyright infringement. To address these concerns, previous methods safeguard the models by eliminating inappropriate concepts. Nonetheless, these models alter the parameters of the backbone network and exert considerable influences on the structural (low-frequency) components of the image, which undermines the model’s ability to retain non-target concepts. In this work, we propose our Dual encoder Modulation network (DuMo), which achieves precise erasure of inappropriate target concepts with minimum impairment to non-target concepts. In contrast to previous methods, DuMo employs the Eraser with PRior Knowledge (EPR) module which modifies the skip connection features of the U-NET and primarily achieves concept erasure on details (high-frequency) components of the image. To minimize the damage to non-target concepts during erasure, the parameters of the backbone U-NET are frozen and the prior knowledge from the original skip connection features is introduced to the erasure process. Meanwhile, the phenomenon is observed that distinct erasing preferences for the image structure and details are demonstrated by the EPR at different timesteps and layers. Therefore, we adopt a novel Time-Layer MOdulation process (TLMO) that adjusts the erasure scale of EPR module’s outputs across different layers and timesteps, automatically balancing the erasure effects and model’s generative ability. Our method achieves state-of-the-art performance on Explicit Content Erasure (detecting only 34 nude parts), Cartoon Concept Removal (with an average LPIPSda of 0.428, 0.113 higher than SOTA at 0.315), and Artistic Style Erasure (with an average LPIPSda of 0.387, 0.088 higher than SOTA at 0.299), clearly outperforming alternative methods. Code is available at https://***/Maplebb/DuMo Copyright © 2025, The Author

关键词： HTTP

来源：评论

学校读者我要写书评

暂无评论

Mixture of experts for audio-visual learning 24

Mixture of experts for audio-visual learning

引用

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Ying Cheng Yang Li Junjie He Rui Feng School of Computer Science Fudan University and Shanghai Key Laboratory of Intelligent Information Processing and Shanghai Collaborative Innovation Center of Intelligent Visual Computing School of Computer Science Fudan University and Shanghai Key Laboratory of Intelligent Information Processing

ISBN: (纸本)9798331314385

With the rapid development of multimedia technology, audio-visual learning has emerged as a promising research topic within the field of multimodal analysis. In this paper, we explore parameter-efficient transfer learning for audio-visual learning and propose the Audio-visual Mixture of Experts (AVMoE) to inject adapters into pre-trained models flexibly. Specifically, we introduce unimodal and cross-modal adapters as multiple experts to specialize in intra-modal and intermodal information, respectively, and employ a lightweight router to dynamically allocate the weights of each expert according to the specific demands of each task. Extensive experiments demonstrate that our proposed approach AVMoE achieves superior performance across multiple audio-visual tasks, including AVE, AVVP, AVS, and AVQA. Furthermore, visual-only experimental results also indicate that our approach can tackle challenging scenes where modality information is missing. The source code is available at https://***/yingchengy/AVMOE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Fast Peer Adaptation with Context-aware Exploration 41

Fast Peer Adaptation with Context-aware Exploration

引用

41st International Conference on Machine Learning, ICML 2024

作者： Ma, Long Wang, Yuanfei Zhong, Fangwei Zhu, Song-Chun Wang, Yizhou Academy for Advanced Interdisciplinary Studies Peking University China Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU China Center on Frontiers of Computing Studies School of Computer Science Peking University China School of Intelligence Science and Technology Peking University China Inst. for Artificial Intelligence Peking University China Nat'l Eng. Research Center of Visual Technology Peking University China

Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods1 Copyright 2024 by the author(s)

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：