Tobacco is a valuable plant in agricultural and commercial industry. Any disease infection to the plant may lower the harvest and interfere the operation of supply chain in the market. image-based deep learning method...
详细信息
Deep Neural Network (DNN) inferences have been proven highly susceptible to carefully engineered adversarial perturbations, presenting a pivotal hindrance to real-world computervision tasks. Most of the existing defe...
详细信息
ISBN:
(纸本)9781450398220
Deep Neural Network (DNN) inferences have been proven highly susceptible to carefully engineered adversarial perturbations, presenting a pivotal hindrance to real-world computervision tasks. Most of the existing defenses have poor generalization ability due to their dependence on relatively limited Adversarial Examples (AE). Furthermore, the existing adversarial training necessitates continually retraining a target network with the sort of attack required to be repelled. The defense strategies that are primarily based on processing the perturbed image eventually fall short when pitted against constantly developing threats. Protection of DNN against adversarial attacks remains a difficult challenge on challenging datasets such as Fashion MNIST and CIFAR10. This paper proposes a GAN-based two-stage adversarial training model named Globally Connected and Trainable Hierarchical Fine Attention (GCTHFA). The first stage of the proposed GCTHFA GAN is to create a reconstructed image that is a purified version of an adversarial example. The proposed approach has used a trainable and globally linked attention map to teach the Generator about the different types of representations an image might have in different convolutional layers located at different levels in a network. The discriminator’s reliance on feature vectors produced by transfer learning eliminates the traditional dependency on standard image pixels. The second step involves adversarial training of a target classifier to provide resistance to such attacks. Extensive testing on the MNIST, Fashion MNIST, and CIFAR10 datasets with different classifiers and attacks show that the proposed model can handle adversarial attack settings for various target models. The proposed model uses only one type of adversarial training, with no requirement for retraining based on the type of attack.
The segmentation of the liver and tumors present in the liver is an essential task for the diagnosis of liver cancer and surgical treatment planning. While most of the works were developed with the goal of improving s...
详细信息
ISBN:
(纸本)9781450398220
The segmentation of the liver and tumors present in the liver is an essential task for the diagnosis of liver cancer and surgical treatment planning. While most of the works were developed with the goal of improving segmentation accuracy, there is a high demand for the development of lightweight architectures to deploy in clinical settings with limited computational resources. Developing such resource-efficient models while maintaining the performance of the state-of-the-art architectures is challenging due to the high class imbalance and small observable changes between tumorous and non-tumorous regions, which would generally require extensive and complex deep learning architectures. The contributions of this paper are twofold; First, this paper proposes to encode information extracted at different scales using a “Pyramid Pooling Module” (PPM) architecture. While different variants of pyramidal architectures are widely used in the context of scene parsing in computervision, this paper explores the use of the PPM architecture for the segmentation of liver and tumor. Second, the paper proposes using the popular lightweight “Efficient-Net” as the backbone network of PSP-Net for feature extraction, thereby reducing the overall model size while achieving relatively similar performance to state-of-the-art architectures. The performance of the proposed model is evaluated on the MICCAI 2017 LiTS dataset. The model achieved a DSC of 95.06% and 79.08% for liver and tumor, respectively, with only 1.76M parameters.
State-of-the-art empirical work has shown that visual representations learned by deep neural networks are robust in nature and capable of performing classification tasks on diverse datasets. For example, CLIP demonstr...
详细信息
ISBN:
(纸本)9781450398220
State-of-the-art empirical work has shown that visual representations learned by deep neural networks are robust in nature and capable of performing classification tasks on diverse datasets. For example, CLIP demonstrated zero-shot transfer performance on multiple datasets for classification tasks in a joint embedding space of image and text pairs. However, it showed negative transfer performance on standard datasets, e.g., BirdsNAP, RESISC45, and MNIST. In this paper, we propose ContextCLIP, a contextual and contrastive learning framework for the contextual alignment of image-text pairs by learning robust visual representations on Conceptual Captions dataset. Our framework was observed to improve the image-text alignment by aligning text and image representations contextually in the joint embedding space. ContextCLIP showed good qualitative performance for text-to-image retrieval tasks and enhanced classification accuracy. We evaluated our model quantitatively with zero-shot transfer and fine-tuning experiments on CIFAR-10, CIFAR-100, Birdsnap, RESISC45, and MNIST datasets for classification task.
Laparoscopic cholecystectomy is a widely performed minimally invasive surgical procedure that imposes many challenges to the operating surgeon. While we strive to understand and automate such surgeries, the key is to ...
详细信息
ISBN:
(纸本)9781450398220
Laparoscopic cholecystectomy is a widely performed minimally invasive surgical procedure that imposes many challenges to the operating surgeon. While we strive to understand and automate such surgeries, the key is to identify the actions involved in it. An action involves a set of tools and a target anatomy, together forming the action triplets. However, the relations between the triplets and their constituents are sparse, making it challenging to learn their relations. In this paper, we propose a graph neural network based approach to exploit these underlying sparse relations in the data. We portray the proposed method’s ability to uniformly learn multiple tasks and classify triplets with an mAP of 0.261. In addition, we experimentally show the inability of fully connected and convolution layers to learn these sparse relations when trained on 40 laparoscopic videos and validated using five videos. Codes will be available at : https://***/iitkliv/groot.
Advances in networking and digital technologies have led to the widespread usage of Online Signature Verification (OSV) frameworks in real-time settings to validate a user's identity. Because of the superior perfo...
详细信息
Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the gene...
详细信息
ISBN:
(纸本)9781450398220
Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the knowledge base of multiple pre-trained convolutional models that act as the backbone for our proposed few-shot framework. Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count, thus paving the way for real-time implementation. We perform an extensive hyperparameter search using a power-line defect detection dataset and obtain an accuracy of 92.30% for the 5-way 5-shot task. Without further tuning, we evaluate our model on competing standards with the existing state-of-the-art methods and outperform them.
Multi-label classification is a generalization of single-label classification, where an unseen sample is automatically assigned a subset of semantically relevant labels from a given vocabulary. In parallel, recent res...
详细信息
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because tra...
详细信息
ISBN:
(纸本)9781450398220
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only tedious and time-consuming but also infeasible if a huge training corpus is required. We propose a technique that performs classical band-pass filtering in the Fourier space to transform well-lit images to low-light images and use them as a proxy for real low-light images. Unlike popular deep learning approaches which require learning thousands of parameters and enormous amounts of training data, the proposed transformation is fast and simple and easy to extend to other tasks such as low-light depth estimation. Our experiments show that the state-of-the-art saliency detection and depth estimation networks trained on our proxy low-light images perform significantly better on real low-light images than networks trained using existing strategies.
Unconstrained low-resolution (LR) face recognition is still a challenging problem in computervision. In real-world scenarios, the gallery images are generally of high-resolution (HR), while the probe images may be of...
详细信息
ISBN:
(纸本)9781450398220
Unconstrained low-resolution (LR) face recognition is still a challenging problem in computervision. In real-world scenarios, the gallery images are generally of high-resolution (HR), while the probe images may be of low resolution. In surveillance applications, the challenge of matching LR to HR is more common because the probe images are captured in low resolution while gallery images are of high resolution. LR to HR face matching is challenging because, in the embedding space, there is a need for a common subspace for mapping the LR and HR embeddings. In LR to LR face matching, where probe and gallery both belong to low-resolution, face identification is more difficult because very less visual information is present in the images. In LR to LR face matching, the challenge becomes very hard if faces are tiny in size and belong to low-resolution. In this paper, we implement a deep learning pipeline for matching the LR to HR and LR to LR faces. Due to the absence of LR and HR images of the same identity in the real-world datasets, we have also generated the LR images from HR images using the synthetic data approach. Extensive experimental analyses have been made to compare the performance to other state-of-the-art models.
暂无评论