Assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This study inve...
详细信息
Modern microservice systems have become increasingly complicated due to the dynamic and complex interactions and runtime environment. It leads to the system vulnerable to performance issues caused by a variety of reas...
详细信息
ISBN:
(数字)9798400702174
ISBN:
(纸本)9798350382143
Modern microservice systems have become increasingly complicated due to the dynamic and complex interactions and runtime environment. It leads to the system vulnerable to performance issues caused by a variety of reasons, such as the runtime environments, communications, coordinations, or implementations of services. Traces record the detailed execution process of a request through the system and have been widely used in performance issues diagnosis in microservice systems. By identifying the execution processes and attribute value combinations that are common in anomalous traces but rare in normal traces, engineers may localize the root cause of a performance issue into a smaller scope. However, due to the complex structure of traces and the large number of attribute combinations, it is challenging to find the root cause from the huge search space. In this paper, we propose TraceContrast, a trace-based multidimensional root cause localization approach. TraceContrast uses a sequence representation to describe the complex structure of a trace with attributes of each span. Based on the representation, it combines contrast sequential pattern mining and spectrum analysis to localize multidimensional root causes efficiently. Experimental studies on a widely used microservice benchmark show that TraceContrast outperforms existing approaches in both multidimensional and instance-dimensional root cause localization with significant accuracy advantages. Moreover, Trace-Contrast is efficient and its efficiency can be further improved by parallel execution.
Real-world data is often unbalanced and exhibits long-tailed distribution over classes. Vanilla classification models trained on imbalanced datasets inherently exhibit bias towards dominant classes. Existing debiasing...
详细信息
The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predictin...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
The growing interest in generating recipes from food images has drawn substantial research attention in recent years. Existing works for recipe generation primarily utilize a two-stage training method—first predicting ingredients from a food image and then generating instructions from both the image and ingredients. Large Multi-modal Models (LMMs), which have achieved notable success across a variety of vision and language tasks, shed light on generating both ingredients and instructions directly from images. Nevertheless, LMMs still face the common issue of hallu-cinations during recipe generation, leading to suboptimal performance. To tackle this issue, we propose a retrieval augmented large multimodal model for recipe generation. We first introduce Stochastic Diversified Retrieval Augmentation (SDRA) to retrieve recipes semantically related to the image from an existing datastore as a supplement, integrating them into the prompt to add diverse and rich context to the input image. Additionally, Self-Consistency Ensemble Voting mechanism is proposed to determine the most confident prediction recipes as the final output. It calculates the consistency among generated recipe candidates, which use different retrieval recipes as context for generation. Extensive experiments validate the effectiveness of our proposed method, which demonstrates state-of-the-art (SOTA) performance in recipe generation on the Recipe1M dataset.
Due to the high cost of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent deep learning-based IQA *** address this, this paper proposes a novel end-to-end blind...
详细信息
Due to the high cost of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent deep learning-based IQA *** address this, this paper proposes a novel end-to-end blind IQA method: ***, we first analyze the causal mechanisms in IQA tasks and construct a causal graph to understand the interplay and confounding effects between distortion types, image contents, and subjective human ***, through shifting the focus from correlations to causality, Causal-IQA aims to improve the estimation accuracy of image quality scores by mitigating the confounding effects using a causality-based optimization *** optimization strategy is implemented on the sample subsets constructed by a Counterfactual Division process based on the Backdoor *** experiments illustrate the superiority of Causal-IQA. Copyright 2024 by the author(s)
Scene Graph Generation (SGG) aims to detect all objects and identify their pairwise relationships existing in the scene. Considering the substantial human labor costs, existing scene graph annotations are often sparse...
Learning radiance fields (NeRF) with powerful 2D diffusion models has garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D representations of NeRF lack explicit modeling of meshes and textures ...
详细信息
The exceptional generative capability of text-to-image models has raised substantial safety concerns regarding the generation of Not-Safe-For-Work (NSFW) content and potential copyright infringement. To address these ...
详细信息
With the rapid development of multimedia technology, audio-visual learning has emerged as a promising research topic within the field of multimodal analysis. In this paper, we explore parameter-efficient transfer lear...
ISBN:
(纸本)9798331314385
With the rapid development of multimedia technology, audio-visual learning has emerged as a promising research topic within the field of multimodal analysis. In this paper, we explore parameter-efficient transfer learning for audio-visual learning and propose the Audio-visual Mixture of Experts (AVMoE) to inject adapters into pre-trained models flexibly. Specifically, we introduce unimodal and cross-modal adapters as multiple experts to specialize in intra-modal and intermodal information, respectively, and employ a lightweight router to dynamically allocate the weights of each expert according to the specific demands of each task. Extensive experiments demonstrate that our proposed approach AVMoE achieves superior performance across multiple audio-visual tasks, including AVE, AVVP, AVS, and AVQA. Furthermore, visual-only experimental results also indicate that our approach can tackle challenging scenes where modality information is missing. The source code is available at https://***/yingchengy/AVMOE.
Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently...
详细信息
Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods1 Copyright 2024 by the author(s)
暂无评论