In this paper, a new method for signature identification and verification based on contourlet transform (CT) is proposed. this method uses contourlet coefficient as the feature extractor and Support Vector Machine (SV...
详细信息
this paper explores the effectiveness of deep features for document image segmentation. the document image segmentation problem is modelled as a pixel labeling task where each pixel in the document image is classified...
详细信息
Retinal edema caused by fluid buildups has links to visionthreatening retinal diseases. Presently, spectral-domain optical coherence tomography (SD-OCT) is the imaging method of choice for evaluating retinal health a...
详细信息
ISBN:
(纸本)9798400710759
Retinal edema caused by fluid buildups has links to visionthreatening retinal diseases. Presently, spectral-domain optical coherence tomography (SD-OCT) is the imaging method of choice for evaluating retinal health and tracking treatment progress. Automated methods for fluid segmentation in retinal OCT B-scans are critical for ocular disease diagnosis. the segmentation task in retinal OCT B-scans with pathological manifestations is challenging as the retinal layered structure varies significantly withthe severity of the underlying disease condition. In this paper, we present a Contextual Self-Attention based U-Net (CoSAUNet) architecture for fluid segmentation in pathological OCT B-scans. the proposed model replaces the basic convolutional layers in the standard U-Net architecture with contextual self-attention layers. the contextual self-attention mechanism integrates contextual information mining among neighboring keys and conventional self-attention into a unified architecture to learn the long-range feature dependencies. the robustness of the proposed CoSAUNet model is evaluated through a 3-fold cross-validation setup on the RETOUCH and the AROI datasets, yielding average Dice scores (DS) of 88.16% and 76.67%, respectively. Furthermore, the CoSAUNet performs comparably to the state-of-the-art methods while using only half of the parameters.
Automated segmentation of medical image volumes promises to reduce costly medical experts' time for annotation. However, using machine learning for the task is challenging due to variations in imaging modalities a...
详细信息
ISBN:
(纸本)9798400710759
Automated segmentation of medical image volumes promises to reduce costly medical experts' time for annotation. However, using machine learning for the task is challenging due to variations in imaging modalities and scarcity of patient data. While interactive image segmentation methods and foundational models incorporating user-provided prompts to refine segmentation masks have shown promise, they overlook crucial sequential information between the slices in 3D medical image volumes and videos, resulting in discontinuities in the segmentation results. this paper proposes a new framework that dynamically updates model parameters during inference in a test time training framework using user-provided scribbles. Our framework preserves acquired knowledge from the previous slices of the current medical volume and the training dataset via student-teacher learning. We evaluate our method on diverse CT, MRI, and microscopic cell datasets. Our framework significantly reduces user annotation time by a factor of 6.72x. Compared to other interactive segmentation methods, we reduce the time by a factor of 2.64x. Our method also outperforms prompting foundation models for segmentation by achieving a dice score of 0.9 in 3-4 interactions compared to 5-8 user interactions for the foundation model, significantly reducing annotation time for the CT and MRI volumes.
Given two views of a static scene, estimation of correspondences between them is required for various computervision tasks, such as 3D reconstruction and registration, motion and structure estimation, and object reco...
详细信息
this paper proposes a web-based BML contents management system. BML is the short form of Broadcast Markup Language which is a script language for data broadcasting contents included in digital TV broadcasting services...
详细信息
ISBN:
(纸本)9780769537894
this paper proposes a web-based BML contents management system. BML is the short form of Broadcast Markup Language which is a script language for data broadcasting contents included in digital TV broadcasting services of Japan. the scripting style of BML is very similar to that of HTML and it also supports style-sheets and JavaScript. However, in BML, there are many restrictions about the display area size, font sizes, color types and so on because the BML-browser of TV is not flexible as compared withthe Web-browser. Although there are a couple of dedicated software packages for BML contents creation, they are very expensive and difficult to use. So, BML contents creation is not easy for the end-user. To make it easier to create BML contents, the authors have been developing a web-based BML contents management system. this paper explains fundamental functionalities provided by the proposed BML contents management system. Already many digital contents were created and stored and we can easily obtain their images by capturing the screen snapshot. So, this paper also proposes style-sheets extraction method for BML contents from such already existing digital contents by imageprocessing techniques.
this paper presents a novel rendering algorithm based on depthimage warping to support virtual pan-tilt-zoom (PTZ) functionalities during 3D view generation. A method based on "3D-ness" knob is proposed for...
详细信息
this two-volume set (CCIS 1567-1568) constitutes the refereed proceedings of the 6h International conference on computervision and imageprocessing, CVIP 2021, held in Rupnagar, India, in December 2021...
详细信息
ISBN:
(数字)9783031113499
ISBN:
(纸本)9783031113482
this two-volume set (CCIS 1567-1568) constitutes the refereed proceedings of the 6h International conference on computervision and imageprocessing, CVIP 2021, held in Rupnagar, India, in December 2021.;the 70 full papers and 20 short papers were carefully reviewed and selected from the 260 submissions. the papers present recent research on such topics as biometrics, forensics, content protection, image enhancement/super-resolution/restoration, motion and tracking, image or video retrieval, image, image/video processing for autonomous vehicles, video scene understanding, human-computer interaction, document image analysis, face, iris, emotion, sign language and gesture recognition, 3D image/video processing, action and event detection/recognition, medical image and video analysis, vision-based human GAIT analysis, remote sensing, and more.
End-to-end automatic speech recognition (ASR) systems achieve promising performance for large-scale speech datasets. However, these systems experience performance degradation when a domain mismatch exists between trai...
详细信息
ISBN:
(纸本)9798400710759
End-to-end automatic speech recognition (ASR) systems achieve promising performance for large-scale speech datasets. However, these systems experience performance degradation when a domain mismatch exists between training and test datasets. this paper addresses the domain adaptation problem by employing adversarial learning in an unsupervised manner, along withthe ASR training. We propose frame level and character level domain adversarial training, which reduces the domain shift between source and target data. Frame-level adversarial training selects all source and target speech frames and tries to classify them into two domains. On the contrary, character-level training generates pseudo-labels for source and target batches and finds the feature distribution for each pseudo-character label. A random feature is selected for each character from the source and target domains. this feature set of all characters is used in the domain classification. Experiments on the Libriadapt and Librispeech clean dataset show that our approaches achieve similar word error rate (WER) reduction as for the state-of-the-art approaches with lower time complexities. the proposed approaches expect promising results for other speech adaptation applications, which will be analyzed in the future.
this paper presents the background and methodologies used in programming and teaching the humanoid robot NAO to play the game of "Simon Says" with human players. Choreographe programming was used to provide ...
详细信息
ISBN:
(纸本)9781728155845
this paper presents the background and methodologies used in programming and teaching the humanoid robot NAO to play the game of "Simon Says" with human players. Choreographe programming was used to provide the overall game logic and incorporate NAO's sensory capabilities. OpenPose pose detection and OpenCV APIs were used to develop the imageprocessing components to convert the raw images captured by NAO for pose classification, and the Keras APIs were used to build the Convolutional Neural Network to classify and recognize the player poses.
暂无评论