First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-r...
详细信息
First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot *** complete interaction process involves both pre-contact interaction intention(i.e., hand motion trends and interaction hotspots) and post-contact interaction manipulation(i.e., manipulation trajectories and hand pose with contact). Existing research typically anticipates only interaction intention while neglecting manipulation, resulting in incomplete predictions and an increased likelihood of intention errors due to the lack of manipulation constraints. To address this issue, we propose a novel model, PEAR(phrase-based hand-object interaction anticipation), which jointly anticipates interaction intention and manipulation. To handle interaction uncertainty, we employ a twofold approach. Firstly, we perform cross-alignment of verbs, nouns, and images to reduce the diversity of hand movement patterns and object functional attributes, thereby mitigating intention uncertainty. Secondly, we establish bidirectional constraints between intention and manipulation using dynamic integration and residual connections, ensuring consistency among elements and thus overcoming manipulation uncertainty. To rigorously evaluate the performance of the proposed model, we collect a new task-relevant dataset, EGO-HOIP,with comprehensive annotations. Extensive experimental results demonstrate the superiority of our method.
Blockchain technology provides a technical solution for the challenges faced by e-government, such as low efficiency, excessive energy consumption, and lack of trust mechanisms. It can promote the establishment of a m...
详细信息
Nowadays, the proliferation of open Internet of Things (IoT) devices has made IoT systems increasingly vulnerable to cyber attacks. It is of great practical significance to solve the security issues of IoT systems. Dr...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
Due to the security and scalability features of hybrid cloud architecture,it can bettermeet the diverse requirements of users for cloud *** a reasonable resource allocation solution is the key to adequately utilize th...
详细信息
Due to the security and scalability features of hybrid cloud architecture,it can bettermeet the diverse requirements of users for cloud *** a reasonable resource allocation solution is the key to adequately utilize the hybrid ***,most previous studies have not comprehensively optimized the performance of hybrid cloud task scheduling,even ignoring the conflicts between its security privacy features and other *** on the above problems,a many-objective hybrid cloud task scheduling optimization model(HCTSO)is constructed combining risk rate,resource utilization,total cost,and task completion ***,an opposition-based learning knee point-driven many-objective evolutionary algorithm(OBL-KnEA)is proposed to improve the performance of model *** algorithm uses opposition-based learning to generate initial populations for faster ***,a perturbation-based multipoint crossover operator and a dynamic range mutation operator are designed to extend the search *** comparing the experiments with other excellent algorithms on HCTSO,OBL-KnEA achieves excellent results in terms of evaluation metrics,initial populations,and model optimization effects.
With the rapid development of information technologies,industrial Internet has become more open,and security issues have become more *** endogenous security mechanism can achieve the autonomous immune mechanism withou...
详细信息
With the rapid development of information technologies,industrial Internet has become more open,and security issues have become more *** endogenous security mechanism can achieve the autonomous immune mechanism without prior ***,endogenous security lacks a scientific and formal definition in industrial ***,firstly we give a formal definition of endogenous security in industrial Internet and propose a new industrial Internet endogenous security architecture with cost ***,the endogenous security innovation mechanism is clearly ***,an improved clone selection algorithm based on federated learning is ***,we analyze the threat model of the industrial Internet identity authentication scenario,and propose cross-domain authentication mechanism based on endogenous key and zero-knowledge *** conduct identity authentication experiments based on two types of blockchains and compare their experimental *** on the experimental analysis,Ethereum alliance blockchain can be used to provide the identity resolution services on the industrial *** of Things Application(IOTA)public blockchain can be used for data aggregation analysis of Internet of Things(IoT)edge ***,we propose three core challenges and solutions of endogenous security in industrial Internet and give future development directions.
Overlooking the issue of false alarm suppression in heterogeneous change detection leads to inferior detection *** paper proposes a method to handle false alarms in heterogeneous change detection.A lightweight network...
详细信息
Overlooking the issue of false alarm suppression in heterogeneous change detection leads to inferior detection *** paper proposes a method to handle false alarms in heterogeneous change detection.A lightweight network of two channels is bulit based on the combination of convolutional neural network(CNN)and graph convolutional network(GCN).CNNs learn feature difference maps of multitemporal images,and attention modules adaptively fuse CNN-based and graph-based features for different *** with a new kernel filter adaptively distinguish between nodes with the same and those with different labels,generating change *** evaluation on two datasets validates the efficacy of the pro-posed method in addressing false alarms.
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggr...
详细信息
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggregating content-and position-based object contexts for semantic ***, motivated because each deep feature map is a global, class-wise representation of the input,we first propose an augmented nonlocal interaction(AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable,relative position embedding branch is then supportably installed in AugNI to capture the global positionbased contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU(standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone...
详细信息
Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system(MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system(MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Pa...
详细信息
Although lots of research has been done in recognizing facial expressions,there is still a need to increase the accuracy of facial expression recognition,particularly under uncontrolled *** use of Local Directional Patterns(LDP),which has good characteristics for emotion detection has yielded encouraging *** innova-tive end-to-end learnable High Response-based Local Directional Pattern(HR-LDP)network for facial emotion recognition is implemented by employing fixed convolutional filters in the proposed *** combining learnable convolutional layers with fixed-parameter HR-LDP layers made up of eight Kirsch filters and derivable simulated gate functions,this network considerably minimizes the number of network *** cost of the parameters in our fully linked layers is up to 64 times lesser than those in currently used deep learning-based detection *** seven well-known databases,including JAFFE,CK+,MMI,SFEW,OULU-CASIA and MUG,the recognition rates for seven-class facial expression recognition are 99.36%,99.2%,97.8%,60.4%,91.1%and 90.1%,*** results demonstrate the advantage of the proposed work over cutting-edge techniques.
暂无评论