We rethink the segment anything model(SAM) and propose a novel multiprompt network called COMPrompter for camouflaged object detection(COD). SAM has zero-shot generalization ability beyond other models and can provide...
详细信息
We rethink the segment anything model(SAM) and propose a novel multiprompt network called COMPrompter for camouflaged object detection(COD). SAM has zero-shot generalization ability beyond other models and can provide an ideal framework for COD. Our network aims to enhance the single prompt strategy in SAM to a multiprompt strategy. To achieve this, we propose an edge gradient extraction module, which generates a mask containing gradient information regarding the boundaries of camouflaged objects. This gradient mask is then used as a novel boundary prompt, enhancing the segmentation process. Thereafter, we design a box-boundary mutual guidance module, which fosters more precise and comprehensive feature extraction via mutual guidance between a boundary prompt and a box prompt. This collaboration enhances the model's ability to accurately detect camouflaged objects. Moreover, we employ the discrete wavelet transform to extract high-frequency features from image embeddings. The high-frequency features serve as a supplementary component to the multiprompt ***, our COMPrompter guides the network to achieve enhanced segmentation results, thereby advancing the development of SAM in terms of COD. Experimental results across COD benchmarks demonstrate that COMPrompter achieves a cutting-edge performance, surpassing the current leading model by an average positive metric of 2.2% in COD10K. In the specific application of COD, the experimental results in polyp segmentation show that our model is superior to top-tier methods as well. The code will be made available at https://***/guobaoxiao/COMPrompter.
Advancements in technology, including the Internet of Things (IoT) revolution, have enabled individuals and businesses to use systems and devices that connect, exchange data, and provide real-time information from far...
详细信息
In this study, we review the fundamentals of IoT architecture and we thoroughly present the communication protocols that have been invented especially for IoT technology. Moreover, we analyze security threats, and gen...
详细信息
Robot manipulation with simulation has become a mainstream approach in the robotics field recently. It entails lower risk and cost compared to direct training a real robot. Various physics engines, such as MuJoCo, off...
详细信息
Radiology Report Generation (RRG) seeks to leverage deep learning techniques to automate the reporting process of radiologists. Current methods are typically modelling RRG as an image-to-text generation task that take...
详细信息
In many elections or competitions, a set of voters assign points to the candidates in a way that indicates their preferences, with the winning candidate being the candidate with the highest total score. When it comes ...
详细信息
Gender bias has been widely studied by the NLP community. However, other more subtle variations of it, such as mansplaining, have yet received little attention. Mansplaining is a discriminatory behaviour that consists...
详细信息
With the global population aging, there is a growing need for innovative assistive technologies to support unpaid carers in maintaining older adults’ quality of life. Socially Assistive Robots (SARs) offer a potentia...
详细信息
Lip-reading is a process of interpreting speech by visually analysing lip *** research in this area has shifted from simple word recognition to lip-reading sentences in the *** paper attempts to use phonemes as a clas...
详细信息
Lip-reading is a process of interpreting speech by visually analysing lip *** research in this area has shifted from simple word recognition to lip-reading sentences in the *** paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system *** classification schemas have been investigated,including characterbased and visemes-based *** visual front-end model of the system consists of a Spatial-Temporal(3D)convolution followed by a 2D *** utilise multi-headed attention for phoneme recognition *** the language model,a Recurrent Neural Network is *** performance of the proposed system has been testified with the BBC Lip Reading Sentences 2(LRS2)benchmark *** with the state-of-the-art approaches in lip-reading sentences,the proposed system has demonstrated an improved performance by a 10%lower word error rate on average under varying illumination ratios.
The cross-view matching of local image features is a fundamental task in visual localization and 3D *** study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency ...
详细信息
The cross-view matching of local image features is a fundamental task in visual localization and 3D *** study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency and accuracy of visual *** on high matching sparseness and coarse-to-fine covisible area detection,FilterGNN utilizes cascaded optimal graph-matching filter modules to dynamically reject outlier ***,we successfully adapted linear attention in FilterGNN with post-instance normalization support,which significantly reduces the complexity of complete graph learning from O(N2)to O(N).Experiments show that FilterGNN requires only 6%of the time cost and 33.3%of the memory cost compared with SuperGlue under a large-scale input size and achieves a competitive performance in various tasks,such as pose estimation,visual localization,and sparse 3D reconstruction.
暂无评论