Video action segmentation aims to identify and localize actions. Existing models have achieved impressive performance with pre-extracted frame-level features, but this may limit zero-shot learning and cross-dataset in...
详细信息
ISBN:
(数字)9798350349399
ISBN:
(纸本)9798350349405
Video action segmentation aims to identify and localize actions. Existing models have achieved impressive performance with pre-extracted frame-level features, but this may limit zero-shot learning and cross-dataset inference, especially for new actions or scenes. To overcome this problem, we propose a novel end-to-end network designed for robust performance across both familiar and novel action segmentation scenarios. Our approach combines a plug-and-play visual prompt module enhancing CLIP features’ temporal understanding, and a learnable text prompt that enriches label semantics and refines the model’s focus, significantly boosting performance. Our results demonstrate that CLIP features can assist in action segmentation tasks, and prompts can improve task effectiveness. Furthermore, our findings show that CLIP features contain information that i3d features do not. We evaluate the proposed method on several video datasets, including Georgia Tech Egocentric Activities (GTEA), 50Salads, and Breakfast, and the results show that the proposed model outperforms existing SOTA models.
Objective and Impact *** this work,we develop a universal anatomical landmark detection model which learns once from multiple datasets corresponding to different anatomical *** with the conventional model trained on a...
详细信息
Objective and Impact *** this work,we develop a universal anatomical landmark detection model which learns once from multiple datasets corresponding to different anatomical *** with the conventional model trained on a single dataset,this universal model not only is more light weighted and easier to train but also improves the accuracy of the anatomical landmark *** accurate and automatic localization of anatomical landmarks plays an essential role in medical image ***,recent deep learning-based methods only utilize limited data from a single *** is promising and desirable to build a model learned from different regions which harnesses the power of big *** model consists of a local network and a global network,which capture local features and global features,*** local network is a fully convolutional network built up with depth-wise separable convolutions,and the global network uses dilated convolution to enlarge the receptive field to model global *** evaluate our model on four 2D X-ray image datasets totaling 1710 images and 72 landmarks in four anatomical *** experimental results show that our model improves the detection accuracy compared to the state-of-the-art *** model makes the first attempt to train a single network on multiple datasets for landmark *** results qualitatively and quantitatively show that our proposed model performs better than other models trained on multiple datasets and even better than models trained on a single dataset separately.
Most existing researches on relation extraction focus on binary flat relations like Bomln relation between a Person and a *** a large portion of objective facts de-scribed in natural language are complex,especially in...
详细信息
Most existing researches on relation extraction focus on binary flat relations like Bomln relation between a Person and a *** a large portion of objective facts de-scribed in natural language are complex,especially in professional documents in fields such as finance and biomedicine that require precise *** example,“the GDP of the United States in 2018 grew 2.9%compared with 2017”describes a growth rate relation between two other relations about the economic index,which is beyond the expressive power of binary flat ***,we propose the nested relation extraction problem and formulate it as a directed acyclic graph(DAG)structure extraction ***,we propose a solution using the Iterative Neural Network which extracts relations layer by *** proposed solution achieves 78.98 and 97.89 FI scores on two nested relation extraction tasks,namely semantic cause-and-efFect relation extraction and formula ***,we observe that nested relations are usually expressed in long sentences where entities are mentioned repetitively,which makes the annotation difficult and ***,we extend our model to incorporate a mention-insensitive mode that only requires annotations of relations on entity concepts(instead of exact mentions)while preserving most of its *** mention-insensitive model performs better than the mention sensitive model when the random level in mention selection is higher than 0.3.
To port the Linux distributions to a new Instruction Set Architecture (ISA), developers have to rebuild the software packages of the distributions. The complex dependencies of the software packages bring a great chall...
详细信息
Active disturbance-rejection methods are effective in estimating and rejecting disturbances in both transient and steady-state *** paper presents a deep observation on and a comparison between two of those methods:the...
详细信息
Active disturbance-rejection methods are effective in estimating and rejecting disturbances in both transient and steady-state *** paper presents a deep observation on and a comparison between two of those methods:the generalized extended-state observer(GESO)and the equivalent input disturbance(EID)from assumptions,system configurations,stability conditions,system design,disturbance-rejection performance,and extensibility.A time-domain index is introduced to assess the disturbance-rejection performance.A detailed observation of disturbance-suppression mechanisms reveals the superiority of the EID approach over the GESO method.A comparison between these two methods shows that assumptions on disturbances are more practical and the adjustment of disturbance-rejection performance is easier for the EID approach than for the GESO method.
The transformational and spatial proximities are important cues for identifying inliers from an appearance based match set because correct matches generally stay close in input images and share similar local transform...
详细信息
Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. As the visual-language models (VLMs) can provide essential general knowledge o...
Technical station dispatching system plays an important role in cargo operation, but due to the large number of dispatching systems and complex operation, dispatchers rely on manual experience to complete the task, an...
详细信息
High-speed trains are inevitably affected by emergencies in their daily operation, which may cause the trains to fail to run according to the original planned timetable. Therefore, how to adjust the timetable of subse...
详细信息
Due to its open-source nature, Android operating system has been the main target of attackers to exploit. Malware creators always perform different code obfuscations on their apps to hide malicious activities. Feature...
详细信息
Due to its open-source nature, Android operating system has been the main target of attackers to exploit. Malware creators always perform different code obfuscations on their apps to hide malicious activities. Features extracted from these obfuscated samples through program analysis contain many useless and disguised features, which leads to many false negatives. To address the issue, in this paper, we demonstrate that obfuscation-resilient malware family analysis can be achieved through contrastive learning. The key insight behind our analysis is that contrastive learning can be used to reduce the difference introduced by obfuscation while amplifying the difference between malware and other types of malware. Based on the proposed analysis, we design a system that can achieve robust and interpretable classification of Android malware. To achieve robust classification, we perform contrastive learning on malware samples to learn an encoder that can automatically extract robust features from malware samples. To achieve interpretable classification, we transform the function call graph of a sample into an image by centrality analysis. Then the corresponding heatmaps can be obtained by visualization techniques. These heatmaps can help users understand why the malware is classified as this family. We implement IFDroid and perform extensive evaluations on two datasets. Experimental results show that IFDroid is superior to state-of-the-art Android malware familial classification systems. Moreover, IFDroid is capable of maintaining a 98.4% F1 on classifying 69,421 obfuscated malware samples. IEEE
暂无评论