As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. this depth information can be inferred back from a set of visual cues presen...
详细信息
ISBN:
(纸本)9781467385640
As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. this depth information can be inferred back from a set of visual cues present in the image. In this work, we present a model that combines two monocular depth cues namely Texture and Defocus. Depth is related to the spatial extent of the defocus blur by assuming that more an object is blurred, the farther it is from the camera. At first, we estimate the amount of defocus blur present at edge pixels of an image. this is referred as the Sparse Defocus map. Using the sparse defocus map we generate the full defocus map. However such defocus maps always contain hole regions and ambiguity in depth. To handle this problem an additional depth cue, in our case texture has been integrated to generate better defocus map. this integration mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region. the sparse defocus map is corrected using texture based rules. the hole regions, where there are no significant edges and texture are detected and corrected in sparse defocus map. We have used region wise propagation for better defocus map generation. the accuracy of full defocus map is increased withthe region wise propagation.
Traditional 2D face recognition systems drastically fails with pose variance and poor illuminations. Many techniques but with limited success has been introduced. Expensive 3D setup can be used to deal withthis probl...
详细信息
ISBN:
(纸本)9781467385640
Traditional 2D face recognition systems drastically fails with pose variance and poor illuminations. Many techniques but with limited success has been introduced. Expensive 3D setup can be used to deal withthis problem. In this work a low cost, low computation and quick good quality 3D reconstruction helping 2D face recognition systems is proposed. the proposed system is a fast automatic 3D face reconstruction approach from rectified stereo images. An automatic synthesis of training images of various face poses is proposed. three enhancements adaptive histogram equalization (AHE) to improve contrast of face images, horizontal gradient ordinal relationship pattern(HGORP) to handle poor illumination and steerable filter(SF) for noise reduction and illumination invariance are used to improve the system performance. Later SURF based matching is done with score level fusion of all three enhancements. A database of 107 subjects has been collected to evaluate the system performance. It is observed that the proposed system can handle large pose variations and poor illumination very well.
Withthe rapid development of modern generative models, the need for an automated synthetic image detection process has never been greater. Recent works in the field of synthetic image detection focus on improving out...
详细信息
ISBN:
(纸本)9798400710759
Withthe rapid development of modern generative models, the need for an automated synthetic image detection process has never been greater. Recent works in the field of synthetic image detection focus on improving out-of-distribution (OoD) classification performance and robustness to common image pre-processing techniques. However, in this work, we intend to explore the nature of an intricate counter-forensic attack, i.e., the reconstruction of real images with Diffusion Model autoencoders, which could be used to adversely affect the performance of modern synthetic image detection algorithms. We present a variety of experiments to study the nature of this counter-forensic attack and use the inferences from these experiments to develop multiple algorithms to detect such reconstructed images while attempting to detect real and purely synthetic images accurately. To do so, we make use of trained classifiers that can detect real images, autoencoder-reconstructed images, and purely synthetic images. Furthermore, we combine these techniques to build a novel ensemble algorithm that competes with state-of-the-art (SoTA) algorithms in the 'Real vs. Fake' image detection task, while detecting autoencoder reconstructed images accurately, attaining an accuracy of 99.2% in the multiclass setting.
this paper presents a novel spectral filtering based deep learning algorithm (SFDL) for detecting logos and stamps in a scanned document image. In a document image, textual contents are main source of high spatial fre...
详细信息
ISBN:
(纸本)9781467385640
this paper presents a novel spectral filtering based deep learning algorithm (SFDL) for detecting logos and stamps in a scanned document image. In a document image, textual contents are main source of high spatial frequency components. Accordingly, the high frequency filtering is used to suppress the text symbols. In the next step, segmentation process is used for localizing the candidate regions of interests such as logos and stamps. Preprocessing of these candidate regions is essential before classification. the proposed preprocessing includes steps such as region fusion, resizing and key point based pooling. Finally, the preprocessed candidate regions are classified using deep convolutional neural network. the main advantage of the SFDL is its capability to detect logos without prior information or assumption about their locations in a document. the performance of the proposed SFDL algorithm is evaluated using publicly accessible document image database StaVer. It is observed that SFDL performs satisfactorily for detecting logo and stamp. the precision and recall measures of the proposed SFDL are compared with existing techniques. Experimental results show that recall and precision of logo detection are 86.8%, 97.2%, respectively. Similarly, recall and precision for stamp detection are 85.3% and 94.8%.
Deepfake technology generates counterfeit facial images by replicating the characteristics of genuine ones. Sophisticated manipulation tactics have enabled criminals to incite social panic or accrue illegal gains by c...
详细信息
ISBN:
(纸本)9798400710759
Deepfake technology generates counterfeit facial images by replicating the characteristics of genuine ones. Sophisticated manipulation tactics have enabled criminals to incite social panic or accrue illegal gains by creating deceptive media, such as forged facial images. Consequently, numerous techniques for detecting deepfakes have been developed to evaluate the authenticity of images. the existing deepfake detection techniques perform well if training and testing datasets exhibit similarity. However, then they do not provide good results if training and testing datasets are dissimilar, thereby not providing generalization ability. In this work, we have proposed the Continual Learning Based Enhanced vision Transformer (CLEViT) model for deepfake detection. the CLEViT model leverages the continual learning mechanism to enhance the model's generalization ability. Continual learning enables deepfake detection models to continuously adapt and update their knowledge based on new data, allowing them to identify manipulation techniques effectively. After that, a vision transformer is used to extract rich contextual information from images using self-attention mechanisms. To validate the efficacy of the model, we have created a multi-task dataset comprising FakeAVCeleb, RFF, RFFD, and the DFDC datasets. the proposed CLEViT model outperforms existing deepfake detection techniques, yielding superior results and simultaneously providing generalization ability.
Breast cancer, the most common type of cancer in women is one of the leading causes of cancer deaths. Due to this, early detection of cancer is the major concern for cancer treatment. the most common screening test ca...
详细信息
ISBN:
(纸本)9781479915880
Breast cancer, the most common type of cancer in women is one of the leading causes of cancer deaths. Due to this, early detection of cancer is the major concern for cancer treatment. the most common screening test called mammography is useful for early detection of cancer. It has been proven that there is potential raise in the cancers detected due to consecutive reading of mammograms. But this approach is not monetarily viable. therefore there is a significant need of computer aided detection systems which can produce intended results and assist medical staff for accurate diagnosis. In this research we made an attempt to build classification system for mammograms using association rule mining based on texture features. the proposed system uses most relevant GLCM based texture features of mammograms. New method is proposed to form associations among different texture features by judging the importance of different features. Resultant associations can be used for classification of mammograms. Experiments are carried out using MIAS image Database. the performance of the proposed method is compared with standard Apriori algorithm. It is found that performance of proposed method is better due to reduction in multiple times scanning of database which results in less computation time. We also investigated the use of association rules in the field of medical image analysis for the problem of mammogram classification.
Point cloud forecasting is a crucial task for the success of motion planning and state estimation problems. However, due to its complex nature and challenges in integrating with existing architectures, it remains an e...
详细信息
ISBN:
(纸本)9798400710759
Point cloud forecasting is a crucial task for the success of motion planning and state estimation problems. However, due to its complex nature and challenges in integrating with existing architectures, it remains an extremely difficult and interesting problem. LiDAR-based 3D sensing is one of the key components in modern autonomous driving systems and leads to scalability challenges as it yields large-scale point cloud data at a high frame rate. Range image representation of point cloud offers a compact representation of LiDAR data which enables applying powerful convolution network architectures for predicting future range images (and thereby point clouds). In this paper, we use range image representation and propose two different non-recurrent deep network models for point cloud forecasting. More specifically, we predict future point clouds from past observed point clouds by introducing a spatio temporal convolution (STC) block in the latent space of range images, thereby avoiding the use of RNNs to achieve much faster inference. the STC has two variants that use the temporal attention and Inception-Net blocks, respectively. We perform experiments on two publicly available datasets, namely KITTI and Nuscenes and report superior quantitative and qualitative results in comparison to SOTA methods while offering compact model with faster inference speeds.
Brain tumour segmentation is a fundamental task in medical image analysis where each year a number of deep learning models are introduced to delineate the tumour regions with high precision. However, most of these wor...
详细信息
ISBN:
(纸本)9798400710759
Brain tumour segmentation is a fundamental task in medical image analysis where each year a number of deep learning models are introduced to delineate the tumour regions with high precision. However, most of these works rely on large number of parameters and higher computational cost, thus, rendering them ineffective in real world application. therefore, it is important to devise efficient models that can be easily deployed on resource-constrained devices and perform at par withthe existing large models. In this paper, we propose a semi-decoupled distillation technique which trains a lightweight "student" model using the features extracted from the decoder of the nnU-Net "teacher" model and its predictions on these features using a single point-wise convolution layer. the final classification layer remains the same for boththe models and is kept frozen while training the "student" network. Our approach follows a two stage training procedure where the tumour regions are detected and extracted in the first step, and then sent for second stage training to segment fine-grain classes, including edema, enhanced tumour and tumour core. Our extensive experimentation shows that a lightweight distilled model performs competitively with large models on brain tumour segmentation.
Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical deta...
详细信息
ISBN:
(纸本)9781450347532
Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical details. the number of k-space measurements is roughly proportional to the sparsity of the MR signal under consideration. Recently, a few works on CSMRI have revealed that the sparsity of the MR signal can be enhanced by suitable weighting of different regularization priors. In this paper, we have proposed an efficient adaptive weighted reconstruction algorithm for the enhancement of sparsity of the MR image. Experimental results show that the proposed algorithm gives better reconstructions with less number of measurements without significant increase of the computational time compared to existing algorithms in this line.
Making large sets of digitized cultural heritage data accessible is a key task for digitization projects. While the amount of data available through print media is vast in humanities, common issues arise as informatio...
详细信息
ISBN:
(纸本)9789897584022
Making large sets of digitized cultural heritage data accessible is a key task for digitization projects. While the amount of data available through print media is vast in humanities, common issues arise as information available for the digitization process is typically fragmented. One reason is the physical distribution of data through print media that has to be collected and merged. Especially, merging causes issues due to differences in terminology, hampering automatic processing. Hence, digitizing musicological data raises a broad range of challenges. In this paper, we present the current state of the on-going musiXplora project, including a multi-faceted database and a visual exploration system for persons, places, objects, terms, media, events, and institutions of musicological interest. A particular focus of the project is using visualizations to overcome traditional problems of handling both, vast amounts and anomalies of information induced by the historicity of data. We present several use cases that highlight the capabilities of the system to support musicologists in their daily workflows.
暂无评论