Multiarmed bandit(MAB) models are widely used for sequential decision-making in uncertain environments, such as resource allocation in computer communication systems.A critical challenge in interactive multiagent syst...
Multiarmed bandit(MAB) models are widely used for sequential decision-making in uncertain environments, such as resource allocation in computer communication systems.A critical challenge in interactive multiagent systems with bandit feedback is to explore and understand the equilibrium state to ensure stable and tractable system performance.
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts...
详细信息
Video question answering(VideoQA) is a challenging yet important task that requires a joint understanding of low-level video content and high-level textual semantics. Despite the promising progress of existing efforts, recent studies revealed that current VideoQA models mostly tend to over-rely on the superficial correlations rooted in the dataset bias while overlooking the key video content, thus leading to unreliable results. Effectively understanding and modeling the temporal and semantic characteristics of a given video for robust VideoQA is crucial but, to our knowledge, has not been well investigated. To fill the research gap, we propose a robust VideoQA framework that can effectively model the cross-modality fusion and enforce the model to focus on the temporal and global content of videos when making a QA decision instead of exploiting the shortcuts in datasets. Specifically, we design a self-supervised contrastive learning objective to contrast the positive and negative pairs of multimodal input, where the fused representation of the original multimodal input is enforced to be closer to that of the intervened input based on video perturbation. We expect the fused representation to focus more on the global context of videos rather than some static keyframes. Moreover, we introduce an effective temporal order regularization to enforce the inherent sequential structure of videos for video representation. We also design a Kullback-Leibler divergence-based perturbation invariance regularization of the predicted answer distribution to improve the robustness of the model against temporal content perturbation of videos. Our method is model-agnostic and can be easily compatible with various VideoQA backbones. Extensive experimental results and analyses on several public datasets show the advantage of our method over the state-of-the-art methods in terms of both accuracy and robustness.
Point cloud completion aims to infer complete point clouds based on partial 3D point cloud *** previous methods apply coarseto-fine strategy networks for generating complete point ***,such methods are not only relativ...
详细信息
Point cloud completion aims to infer complete point clouds based on partial 3D point cloud *** previous methods apply coarseto-fine strategy networks for generating complete point ***,such methods are not only relatively time-consuming but also cannot provide representative complete shape features based on partial *** this paper,a novel feature alignment fast point cloud completion network(FACNet)is proposed to directly and efficiently generate the detailed shapes of *** aligns high-dimensional feature distributions of both partial and complete point clouds to maintain global information about the complete *** its decoding process,the local features from the partial point cloud are incorporated along with the maintained global information to ensure complete and time-saving generation of the complete point *** results show that FACNet outperforms the state-of-theart on PCN,Completion3D,and MVP datasets,and achieves competitive performance on ShapeNet-55 and KITTI ***,FACNet and a simplified version,FACNet-slight,achieve a significant speedup of 3–10 times over other state-of-the-art methods.
Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech r...
详细信息
Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many studies have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However,although several surveys have provided overall pictures of the application of deep learning techniques in software engineering,they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-tuned for software engineering tasks. We still lack surveys explaining the advances of subareas in software engineering driven by deep learning techniques, as well as challenges and opportunities in each subarea. To this end, in this study, we present the first task-oriented survey on deep learning-based software engineering. It covers twelve major software engineering subareas significantly impacted by deep learning techniques. Such subareas spread out through the whole lifecycle of software development and maintenance, including requirements engineering, software development, testing, maintenance, and developer collaboration. As we believe that deep learning may provide an opportunity to revolutionize the whole discipline of software engineering, providing one survey covering as many subareas as possible in software engineering can help future research push forward the frontier of deep learning-based software engineering more systematically. For each of the selected subareas,we highlight the major advances achieved by applying deep learning techniques with pointers to the available datasets i
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation *** their trans...
详细信息
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation *** their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant *** challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy *** works often conflated safety issues with security *** contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of *** on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in ***,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.
Nowadays, the proliferation of open Internet of Things (IoT) devices has made IoT systems increasingly vulnerable to cyber attacks. It is of great practical significance to solve the security issues of IoT systems. Dr...
详细信息
With the rapid development of information technologies,industrial Internet has become more open,and security issues have become more *** endogenous security mechanism can achieve the autonomous immune mechanism withou...
详细信息
With the rapid development of information technologies,industrial Internet has become more open,and security issues have become more *** endogenous security mechanism can achieve the autonomous immune mechanism without prior ***,endogenous security lacks a scientific and formal definition in industrial ***,firstly we give a formal definition of endogenous security in industrial Internet and propose a new industrial Internet endogenous security architecture with cost ***,the endogenous security innovation mechanism is clearly ***,an improved clone selection algorithm based on federated learning is ***,we analyze the threat model of the industrial Internet identity authentication scenario,and propose cross-domain authentication mechanism based on endogenous key and zero-knowledge *** conduct identity authentication experiments based on two types of blockchains and compare their experimental *** on the experimental analysis,Ethereum alliance blockchain can be used to provide the identity resolution services on the industrial *** of Things Application(IOTA)public blockchain can be used for data aggregation analysis of Internet of Things(IoT)edge ***,we propose three core challenges and solutions of endogenous security in industrial Internet and give future development directions.
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone...
详细信息
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggr...
详细信息
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggregating content-and position-based object contexts for semantic ***, motivated because each deep feature map is a global, class-wise representation of the input,we first propose an augmented nonlocal interaction(AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable,relative position embedding branch is then supportably installed in AugNI to capture the global positionbased contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU(standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Pa...
详细信息
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Patterns(LDP),which has good characteristics for emotion detection has yielded encouraging *** innova-tive end-to-end learnable High Response-based Local Directional Pattern(HR-LDP)network for facial emotion recognition is implemented by employing fixed convolutional filters in the proposed *** combining learnable convolutional layers with fixed-parameter HR-LDP layers made up of eight Kirsch filters and derivable simulated gate functions,this network considerably minimizes the number of network *** cost of the parameters in our fully linked layers is up to 64 times lesser than those in currently used deep learning-based detection *** seven well-known databases,including JAFFE,CK+,MMI,SFEW,OULU-CASIA and MUG,the recognition rates for seven-class facial expression recognition are 99.36%,99.2%,97.8%,60.4%,91.1%and 90.1%,*** results demonstrate the advantage of the proposed work over cutting-edge techniques.
暂无评论