The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In prac...
详细信息
Many unsupervised visual anomaly detection methods train an auto-encoder to reconstruct normal samples and then leverage the reconstruction error map to detect and localize the anomalies. However, due to the powerful ...
详细信息
Communication efficiency is one of the key bottlenecks in Federated Learning (FL). Compression techniques, such as sparsification and quantization, are used to reduce communication overhead. However, joint designs of ...
Communication efficiency is one of the key bottlenecks in Federated Learning (FL). Compression techniques, such as sparsification and quantization, are used to reduce communication overhead. However, joint designs of these techniques under communication constraints are not well-explored. This paper investigates the joint uplink compression problem in communication-constrained FL systems. We propose a Block-TopK sparsification scheme to reduce the proportion of bits used for locating entries of a sparsified vector. Considering the communication constraints, an optimization formulation is proposed to minimize the compression error. By solving the optimization problem, our joint compression method provides a better trade-off between sparsity budget and bit width. Numerical results demonstrate that our approach achieves 99.96% of baseline accuracy with only 1.56% of the baseline communication overhead when training ResNet-18 on CIFAR-10.
Mixed polarity Reed-Muller (MPRM) circuit area optimization has become a research hotspot in the field of integrated circuit design. It is a combinatorial optimization, aiming at finding the MPRM expression with the l...
详细信息
Emotion recognition in conversation (ERC) is a prominent research topic in natural language processing, widely applicable in various scenarios. However, accessing external commonsense knowledge and related dialogue to...
详细信息
ISBN:
(数字)9798350349184
ISBN:
(纸本)9798350349191
Emotion recognition in conversation (ERC) is a prominent research topic in natural language processing, widely applicable in various scenarios. However, accessing external commonsense knowledge and related dialogue topics, along with the complexity of fusing utterances, presents significant challenges. In this paper, we introduce a Topic-based Multilayer Knowledge Filtering (TMKF) model to enhance dialog emotion recognition accuracy. TMKF acquires commonsense knowledge for each utterance and employs two Variational Autoencoder (VAE) modules to extract global and speaker-level dialogue topics. We utilize Global Knowledge Filtering (GKF) and Local Knowledge Filtering (LKF) to obtain topic-specific knowledge representations after filtering commonsense information through global and local topic hierarchies. Subsequently, we leverage relational graphs to convolve commonsense knowledge with utterances, resulting in knowledge-constrained utterance representations for classification. Our proposed model is evaluated through extensive experiments on four widely used benchmark datasets for conversational emotion recognition. The results underscore the effectiveness of TMKF, which significantly outperforms other methods in standard metric evaluations.
The localization of image splicing involves identifying pixels in an image that have been spliced from other images, necessitating the discernment of splicing features. Despite significant advancements driven by the r...
详细信息
ISBN:
(数字)9798350349399
ISBN:
(纸本)9798350349405
The localization of image splicing involves identifying pixels in an image that have been spliced from other images, necessitating the discernment of splicing features. Despite significant advancements driven by the rise of social media and deep learning, existing methods exhibit limitations, often neglecting the integration of coarse and precise features and lacking the ability to understand objects. This leads to erroneous predictions in identifying spliced regions. This paper proposes Segment Anything Model with Integrated Compression and Edge artifacts (SAM-ICE) for the localization of image splicing, addressing these limitations by fusing forged edge features and compression artifact features. Leveraging SAM’s object understanding ability, our method identifies spliced regions using the fused features as guidance. Specifically, we employ Edge Artifact Extractor (EAE) to extract fine high-frequency edge splicing features and Compression Artifact Extractor (CAE) to extract coarse compression artifact features. By combining these features, our method utilizes coarse-fine features to accurately pinpoint the spliced portions of the image. Experimental results demonstrate the superior accuracy, robustness, and generalizability of our method compared to the state-of-the-arts.
End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. A...
详细信息
End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they overlook the exploring on multilingual text spotting which requires an extra script identification task. In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection, recognition, and script identification simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese. On the other hand, our DeepSolo++ achieves better performance on the additionally introduced script identification task with a simpler training pipeline compared with previous methods. Extensive experiments on public benchmarks demonstrate that our simple approach achieves better training efficiency compared with Transformer-based models and outperforms the previous state-of-the-art. For example, on ICDAR 2019 ReCTS for Chinese text, our method boosts the 1-NED metric to a new record of 78.3%. On ICDAR 2019 MLT, DeepSolo++ achieves absolute 5.5% H-mean and 8.0% AP improvements on joint detection an
Multimodal sentiment analysis(MSA) aims to synthesize multiple pieces of information, including textual, visual, and auditory information, to more accurately infer sentiment polarity. Due to the existence of complex a...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Multimodal sentiment analysis(MSA) aims to synthesize multiple pieces of information, including textual, visual, and auditory information, to more accurately infer sentiment polarity. Due to the existence of complex and diverse affective causal relationships between different modalities, they interact with each other and constitute a multidimensional expression of human emotions. In the face of unbalanced data, i.e., when there are significant differences in the number of samples between different categories, the attention mechanism of the model tends to focus on spurious correlations in the training data, seriously impairing the model’s generalization ability and leading to bias, unbalanced prediction, and false correlation identification problems. To address the above problems, this study proposes a new multimodal fusion method based on the "within-sample sampling" and "cross-sample sampling" front door adjustment methods in causal inference strategies: the multi-head causal attention fusion method (MCAF). This method aims to capture potential causal relationships between different modalities through different attention heads, helping the model focus more on key causal features and avoid paying too much attention to false correlations in the training data. This helps improve the generalization performance of the model, make the decision-making process of the model more interpretable, and enhance the accuracy and robustness of the model for multimodal sentiment analysis tasks, making it more suitable for practical application scenarios.
The low-altitude economy (LAE), as a new economic paradigm, plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication. Specifically, unmanne...
详细信息
The low-altitude economy (LAE), as a new economic paradigm, plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication. Specifically, unmanned aerial vehicles (UAVs), as one of the core technologies of the LAE, can be deployed to provide communication coverage, facilitate data collection, and relay data for trapped users, thereby significantly enhancing the efficiency of post-disaster response efforts. However, conventional UAV self-organizing networks exhibit low reliability in long-range cases due to their limited onboard energy and transmit ability. Therefore, in this paper, we design an efficient and robust UAV-swarm enabled collaborative self-organizing network to facilitate post-disaster communications. Specifically, a ground device transmits data to UAV swarms, which then use collaborative beamforming (CB) technique to form virtual antenna arrays and relay the data to a remote access point (AP) efficiently. Then, we formulate a rescue-oriented post-disaster transmission rate maximization optimization problem (RPTRMOP), aimed at maximizing the transmission rate of the whole network. Given the challenges of solving the formulated RPTRMOP by using traditional algorithms, we propose a two-stage optimization approach to address it. In the first stage, the optimal traffic routing and the theoretical upper bound on the transmission rate of the network are derived. In the second stage, we transform the formulated RPTRMOP into a variant named V-RPTRMOP based on the obtained optimal traffic routing, aimed at rendering the actual transmission rate closely approaches its theoretical upper bound by optimizing the excitation current weight and the placement of each participating UAV via a diffusion model-enabled particle swarm optimization (DM-PSO) algorithm. Simulation results show the effectiveness of the proposed two-stage optimization approach in improving the transmission rate of the construct
The most critical task in personalized news recommendation is to perform exact matching between candidate news and users’ interests. Existing news recommendation methods usually model users’ interests from historica...
详细信息
ISBN:
(数字)9798350349184
ISBN:
(纸本)9798350349191
The most critical task in personalized news recommendation is to perform exact matching between candidate news and users’ interests. Existing news recommendation methods usually model users’ interests from historical clicked news items without considering candidate news, which makes it difficult to precisely match candidate news with users’ interests. Moreover, it is challenging to distinguish similar news when forming a vector representation. Simply adding candidate news to the user interest modeling process without considering word-level information does not provide sufficient discrimination for recommending appropriate news. In this paper, we propose a news recommendation model with candidate-aware fine-grained interaction information (CAFI). In our approach, we propose a fine-grained interaction module that matches candidate news items and text fragments of each historical news item at each semantic granularity to achieve interaction between the two at the word level and help similar historical news form more specific and accurate representations. In addition, our proposed gated self-attention mechanism utilizes the candidate news features as channel moderation gates to implement the process of filtering out information that is irrelevant to the candidate news for learning user interest representations, thereby better matching the candidate news to specific user interests. Experiments conducted on the large real-world Microsoft News dataset (MIND) show that our model significantly outperforms the previously developed models in terms of all metrics. Our code is published at the following URL: https://***/liyiweneven/CAFI.
暂无评论