vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT [12], DelT [54]) simi-lar to the original work in textual models or more re...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT [12], DelT [54]) simi-lar to the original work in textual models or more re-cently based on spectral layers (Fnet [29], GFNet [46], AFNO [15]). We hypothesize that spectral layers cap-ture high-frequency information such as lines and edges, while attention layers capture token interactions. We inves-tigate this hypothesis through this work and observe that indeed mixing spectral and multi-headed attention layers provides a better transformer architecture. We thus pro-pose the novel Spectformer architecture for vision trans-formers that has initial spectral and deeper multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature repre-sentation appropriately and it yields improved performance over other transformer representations. For instance, it im-proves the top-1 accuracy by 2% on ImageNet compared to both GFNet-H and LiT. SpectFormer-H-S reaches 84.25% top-1 accuracy on ImageNet-1 K (state of the art for small version). Further, Spectformer-H-L achieves 85.7% which is the state of the art for the comparable base version of the transformers. We further validated the SpectFormer per-formance in other scenarios such as transfer learning on standard datasets such as ClFAR-10, ClFAR-100, Oxford-lIlT-flower, and Standford Car datasets. We then investigate its use in downstream tasks such as object detection and instance segmentation on the MS-COCO dataset and ob-serve that Spectformer shows consistent performance that is comparable to the best backbones and can be further optimized and improved. The source code is available on this website https://***/badripatro/SpectFormers.
Camouflaged object detection (COD), the task of identifying objects concealed within their surroundings, is often quite challenging due to the similarity that exists between the foreground and background. By incorpora...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Camouflaged object detection (COD), the task of identifying objects concealed within their surroundings, is often quite challenging due to the similarity that exists between the foreground and background. By incorporating an additional referring image where the target object is clearly visible, we can leverage the similarities between the two images to detect the camouflaged object. In this paper, we propose a novel problem setup: referring camouflaged object discovery (RCOD). In RCOD, segmentation occurs only when the object in the referring image is also present in the camouflaged image; otherwise, a blank mask is returned. This setup is particularly valuable when searching for specific camouflaged objects. Current COD methods are often generic, leading to numerous false positives in applications focused on specific objects. To address this, we introduce a new framework called Co-Saliency Inspired Referring Camouflaged Object Discovery (CIRCOD). Our approach consists of two main components: Co-Saliency-Aware Image Transformation (CAIT) and Co-Salient Object Discovery (CSOD). The CAIT module reduces the appearance and structural variations between the camouflaged and referring images, while the CSOD module utilizes the similarities between them to segment the camouflaged object, provided the images are semantically similar. Covering all semantic categories in current COD benchmark datasets, we collected over 1,000 referring images to validate our approach. Our extensive experiments demonstrate the effectiveness of our method and show that it achieves superior results compared to existing methods. Code is available at https://***/avigupta2798/CIRCOD/.
The proceedings contains 165 papers on computervision and patternrecognition. Topics discussed include recognition systems, image processing, computational methods, algorithms and information use.
ISBN:
(纸本)0818658274
The proceedings contains 165 papers on computervision and patternrecognition. Topics discussed include recognition systems, image processing, computational methods, algorithms and information use.
The proceedings contains 144 papers. Topics discussed include motion tracking, face detection and recognition, pattern analysis, two dimensional and low-level vision, real time systems, computervision, medical image ...
详细信息
The proceedings contains 144 papers. Topics discussed include motion tracking, face detection and recognition, pattern analysis, two dimensional and low-level vision, real time systems, computervision, medical image analysis, deformable models and shape, video imaging, shape analysis, video libraries, physics-based vision, face recognition, computer graphics and patternrecognition.
Text driven diffusion models have shown remarkable capabilities in editing images. However, when editing 3D scenes, existing works mostly rely on training a NeRF for 3D editing. Recent NeRF editing methods leverages e...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Text driven diffusion models have shown remarkable capabilities in editing images. However, when editing 3D scenes, existing works mostly rely on training a NeRF for 3D editing. Recent NeRF editing methods leverages edit operations by deploying 2D diffusion models and project these edits into 3D space. They require strong positional priors alongside text prompt to identify the edit location. These methods are operational on small 3D scenes and are more generalized to particular scene. They require training for each specific edit and cannot be exploited in real-time edits. To address these limitations, we propose a novel method, FreeEdit, to make edits in training free manner using mesh representations as a substitute for NeRF. Training-free methods are now a possibility because of the advances in foundation model's space. We leverage these models to bring a training-free alternative and introduce solutions for insertion, replacement and deletion. We consider insertion, replacement and deletion as basic blocks for performing intricate edits with certain combinations of these operations. Given a text prompt and a 3D scene, our model is capable of identifying what object should be inserted/replaced or deleted and location where edit should be performed. We also introduce a novel algorithm as part of FreeEdit to find the optimal location on grounding object for placement. We evaluate our model by comparing it with baseline models on a wide range of scenes using quantitative and qualitative metrics and showcase the merits of our method with respect to others. Project page: https://***/FreeEdit_page/
Line segment detection is a fundamental procedure in computervision, patternrecognition, and image analysis applications. The paper proposes a novel method for wide line segment detection especially endpoints determ...
详细信息
ISBN:
(数字)9798331506520
ISBN:
(纸本)9798331506537
Line segment detection is a fundamental procedure in computervision, patternrecognition, and image analysis applications. The paper proposes a novel method for wide line segment detection especially endpoints determination based on the Guided Scale Space Radon Transform and Hessian orientations. The method begins by determining the centerlines of wide lines and then exploit the image Hessian orientations around these lines to define binary region support of the line segments and then detect endpoints. The method shows to be robust against blur and noise on synthetic images where, the evaluation of the outcomes reveals the correctness of the detection by achieving low errors. In addition, results on real images are very promising.
The proceedings contains 158 papers from 2001 ieeecomputer society conference on computervision and patternrecognition. The topics discussed include: image indexing, image segmentation, computervision, image codin...
详细信息
The proceedings contains 158 papers from 2001 ieeecomputer society conference on computervision and patternrecognition. The topics discussed include: image indexing, image segmentation, computervision, image coding, patternrecognition systems, image magnification, video inpainting, visual tracking, motion estimation, face recognition, imaging systems, character recognition and feature clustering.
This Volume 1 of 2 of the conference proceedings contains 117 papers. Topics discussed include stereo vision, tracking, image retrieval, illumination, faces, shape and segmentation, shape pattern and statistics, motio...
详细信息
This Volume 1 of 2 of the conference proceedings contains 117 papers. Topics discussed include stereo vision, tracking, image retrieval, illumination, faces, shape and segmentation, shape pattern and statistics, motion and motion rendering, color and light, matching and recognition.
The proceedings contains 137 papers. Topics discussed include image segmentation and restoration, object recognition and pose recovery, human motion and articulated motion, target recognition, contour recognition, act...
详细信息
The proceedings contains 137 papers. Topics discussed include image segmentation and restoration, object recognition and pose recovery, human motion and articulated motion, target recognition, contour recognition, active and real-time vision, face detection and recognition, graph matching and calibration, shape representation, motion estimation and structure, stereo matching, texture and shading, image database and document analysis, feature extraction and detection, patternrecognition, low level vision.
暂无评论