The transformer architecture [1] has been widely used for natural language processing(NLP) tasks. Under the inspiration of its excellent performance in NLP, transformer-based models [2, 3] have established many new re...
The transformer architecture [1] has been widely used for natural language processing(NLP) tasks. Under the inspiration of its excellent performance in NLP, transformer-based models [2, 3] have established many new records in various computer vision tasks. However, most vision transformers(Vi Ts) suffer from large model sizes, large run-time memory consumption, and high computational costs. Therefore, impending needs exist to develop and deploy lightweight and efficient vision transformers.
With the rapid development of deep learning, current deep models can learn a fixed number of classes with high performance. However, in our ever-changing world, data often come from the open environment, which is with...
With the rapid development of deep learning, current deep models can learn a fixed number of classes with high performance. However, in our ever-changing world, data often come from the open environment, which is with stream format or available temporarily due to privacy issues. As a result, the classification model should learn new classes incrementally instead of restarting the training process.
For Unmanned Aerial Vehicles (UAVs) monitoring tasks, capturing high quality images of target objects is important for subsequent recognition. Concerning the problem, many prior works study placement/trajectory planni...
详细信息
Relation extraction is a pivotal task within the field of natural language processing,boasting numerous real-world *** research predominantly centers on monolingual relation extraction or cross-lingual enhance-ment fo...
详细信息
Relation extraction is a pivotal task within the field of natural language processing,boasting numerous real-world *** research predominantly centers on monolingual relation extraction or cross-lingual enhance-ment for relation ***,there exists a notable gap in understanding relation extraction within mix-lingual(or code-switching)*** these scenarios,individuals blend content from different languages within sentences,gen-erating mix-lingual *** effectiveness of existing relation extraction models in such scenarios remains largely unex-plored due to the absence of dedicated *** address this gap,we introduce the Mix-Lingual Relation Extraction(MixRE)task and construct a human-annotated dataset MixRED to support this ***,we propose a hierar-chical training approach for the mix-lingual scenario named Mix-Lingual Training(MixTrain),designed to enhance the performance of large language models(LLMs)when capturing relational dependencies from mix-lingual content spanning different semantic *** experiments involve evaluating state-of-the-art supervised models and LLMs on the con-structed dataset,with results indicating that MixTrain notably improves model ***,we investigate the effectiveness of using mix-lingual content as a tool to transfer learned relational dependencies across different ***,we delve into factors influencing model performance for both supervised models and LLMs in the novel MixRE task.
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existi...
详细信息
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like woodpeckers heal trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://***/BradyFU/Woodpecker.
Emotion-cause pair extraction(ECPE)aims to extract all the pairs of emotions and corresponding causes in a *** generally contains three subtasks,emotions extraction,causes extraction,and causal relations detection bet...
详细信息
Emotion-cause pair extraction(ECPE)aims to extract all the pairs of emotions and corresponding causes in a *** generally contains three subtasks,emotions extraction,causes extraction,and causal relations detection between emotions and *** works adopt pipelined approaches or multi-task learning to address the ECPE ***,the pipelined approaches easily suffer from error propagation in real-world *** multi-task learning cannot optimize all tasks globally and may lead to suboptimal extraction *** address these issues,we propose a novel framework,Pairwise Tagging Framework(PTF),tackling the complete emotion-cause pair extraction in one unified tagging *** prior works,PTF innovatively transforms all subtasks of ECPE,i.e.,emotions extraction,causes extraction,and causal relations detection between emotions and causes,into one unified clause-pair tagging *** this unified tagging task,we can optimize the ECPE task globally and extract more accurate emotion-cause *** validate the feasibility and effectiveness of PTF,we design an end-to-end PTF-based neural network and conduct experiments on the ECPE benchmark *** experimental results show that our method outperforms pipelined approaches significantly and typical multi-task learning approaches.
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previo...
详细信息
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the “diverse ensemble” that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail *** second is the “error correction” that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the “diverse ensemble” to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble(RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation ***, RDE converges in less than 30 training epochs without increasing the computational overhead.
Multi-label image classification is a critical task in computer vision, in which the correlations between labels are typically exploited by modern classifiers for an effective classification. In this study, we propose...
详细信息
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance b...
详细信息
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.
The large-scale multi-objective optimization algorithm(LSMOA),based on the grouping of decision variables,is an advanced method for handling high-dimensional decision ***,in practical problems,the interaction among de...
详细信息
The large-scale multi-objective optimization algorithm(LSMOA),based on the grouping of decision variables,is an advanced method for handling high-dimensional decision ***,in practical problems,the interaction among decision variables is intricate,leading to large group sizes and suboptimal optimization effects;hence a large-scale multi-objective optimization algorithm based on weighted overlapping grouping of decision variables(MOEAWOD)is proposed in this ***,the decision variables are perturbed and categorized into convergence and diversity variables;subsequently,the convergence variables are subdivided into groups based on the interactions among different decision *** the size of a group surpasses the set threshold,that group undergoes a process of weighting and overlapping ***,the interaction strength is evaluated based on the interaction frequency and number of objectives among various decision *** decision variable with the highest interaction in the group is identified and disregarded,and the remaining variables are then reclassified into ***,the decision variable with the strongest interaction is added to each *** minimizes the interactivity between different groups and maximizes the interactivity of decision variables within groups,which contributed to the optimized direction of convergence and diversity exploration with different *** was subjected to testing on 18 benchmark large-scale optimization problems,and the experimental results demonstrate the effectiveness of our *** with the other algorithms,our method is still at an advantage.
暂无评论