Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police sea...
详细信息
ISBN:
(纸本)9781665448994
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing.multi-modal data. To tackle this problem, we propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval. We also propose two task-specific modules to improve performance: a substitution module that helps features from different domains to be embedded in the same space and a future prediction module that learns temporal information. SBnet has been trained using the CityFlow-NL dataset that contains 2,498 tracks of vehicles with three unique natural language descriptions each and tested 530 unique vehicle tracks and their corresponding query sets. SBNet achieved a significant improvement over the baseline in the natural language-based vehicle tracking track in the AI City Challenge 2021. Source Code: https://***/lsrock1/nlp_search
This paper proposes a GPU-based Near-data-processing.(NDP) architecture as well as a well-matched programming model considering both the characteristics of image applications and NDP constraints. First, data allocatio...
ISBN:
(纸本)9781728136134
This paper proposes a GPU-based Near-data-processing.(NDP) architecture as well as a well-matched programming model considering both the characteristics of image applications and NDP constraints. First, data allocation to the processing.unit is handled to keep the data locality considering the memory access pattern. Second, this predictable allocation enables to design a compact but efficient NDP architecture. By applying a prefetcher that leverages the pattern aware data allocation, the number of active warps and on-chip SRAM size of NDP are significantly reduced. This allows to satisfy the NDP constraints and increases the opportunity to integrate more processing.units on a memory logic die. The evaluation results for various imageprocessing.benchmarks show that the proposed NDP GPU improves the performance compared to the baseline GPU.
Transformer models have gained much success in natural language processing. In the computer vision field, transformer-based backbones recently compete with CNN-based backbones in many tasks. The success of transformer...
详细信息
Cervical cancer is one of the leading causes of cancer death in women aged 20 to 39 years, which emphasizes the importance of cervical precancerous diagnosis and treatment. Although there are many attempts on medical ...
详细信息
ISBN:
(纸本)9781665448994
Cervical cancer is one of the leading causes of cancer death in women aged 20 to 39 years, which emphasizes the importance of cervical precancerous diagnosis and treatment. Although there are many attempts on medical imageprocessing. the research on the automatic diagnosis of cervical precancerous pathology is still scarce. In this paper, a challenging end-to-end automatic segmentation task for cervical precancerous diagnosis is focused. Specifically, considering that the diagnosis of cervical lesions relies heavily on spatial information, a hierarchical spatial pyramid network (HSP-Net) is proposed to enhance the representation ability of cervical structural features. First, a vertical hierarchical spatial pyramid (V-HSP) network is devised to aggregate the multiscale information during the feature extraction of the encoder. Second, a horizontal hierarchical spatial pyramid (H-HSP) network is designed to fuse information of multiscale receptive fields before and after cascading features from different branches. Experiments on the public dataset MTCHI demonstrate that HSP-Net achieves the state-of-the-art performance, reflecting the potential to assist doctors and patients clinically.
In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace 1 x 1 convolution layers in deep neural networks. In the WHT domain, we denoise the transform dom...
详细信息
ISBN:
(纸本)9781665448994
In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace 1 x 1 convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-free operators from the basic 2x2 Hadamard transform to implement 3 x 3 depthwise separable convolution layers. Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy. For example, by replacing the final third bottleneck layers, we reduce the number of parameters from 2.270M to 947K. This reduces the accuracy from 95.21% to 92.88% on the CIFAR-10 dataset. Our approach significantly improves the speed of data processing. The fast Walsh-Hadamard transform has a computational complexity of O(mlog(2)m). As a result, it is computationally more efficient than the 1 x 1 convolution layer. The fast Walsh-Hadamard layer processes a tensor in R-10x32x32x1024 about 2 times faster than 1 x 1 convolution layer on NVIDIA Jetson Nano computer board.
Advances in remote sensing technology have led to the capture of massive amounts of data. Increased image resolution, more frequent revisit times, and additional spectral channels have created an explosion in the amou...
详细信息
ISBN:
(纸本)9781665448994
Advances in remote sensing technology have led to the capture of massive amounts of data. Increased image resolution, more frequent revisit times, and additional spectral channels have created an explosion in the amount of data that is available to provide analyses and intelligence across domains, including agriculture. However, the processing.of this data comes with a cost in terms of computation time and money, both of which must be considered when the goal of an algorithm is to provide real-time intelligence to improve efficiencies. Specifically, we seek to identify nutrient deficient areas from remotely sensed data to alert farmers to regions that require attention;detection of nutrient deficient areas is a key task in precision agriculture as farmers must quickly respond to struggling areas to protect their harvests. Past methods have focused on pixel-level classification (i.e. semantic segmentation) of the field to achieve these tasks, often using deep learning models with tens-of-millions of parameters. In contrast, we propose a much lighter graph-based method to perform node-based classification. We first use Simple Linear Iterative Cluster (SLIC) to produce super-pixels across the field. Then, to perform segmentation across the non-Euclidean domain of superpixels, we leverage a Graph Convolutional Neural Network (GCN). This model has 4-orders-of-magnitude fewer parameters than a CNN model and trains in a matter of minutes.
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond ...
ISBN:
(纸本)9781665448994
The proceedings contain 516 papers. The topics discussed include: OmniLayout: room layout reconstruction from indoor spherical panoramas;boosting adversarial robustness using feature level stochastic smoothing;beyond joint demosaicking and denoising: an imageprocessing.pipeline for a pixel-bin image sensor;assessment of deep learning based blood pressure prediction from PPG and rPPG signals;towards domain-specific explainable AI: model interpretation of a skin image classifier using a human approach;DAMSL: domain agnostic meta score-based learning;deep learning based spatial-temporal in-loop filtering for versatile video coding;automated tackle injury risk assessment in contact-based sports - a rugby union example;two-stage network for single image super-resolution;and ***: dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery.
Previous works on multi-label imagerecognition (MLIR) usually use CNNs as a starting point for research. In this paper, we take pure Vision Transformer (ViT) as the research base and make full use of the advantages o...
详细信息
Handwritten letter classification of any given language has the potential to be used in various fields such as literature, educational institutions, digitization of government records etc. Bengali language with its co...
详细信息
A significant challenge in the field of object detection lies in the system’s performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our stu...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
A significant challenge in the field of object detection lies in the system’s performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces ‘Feature Corrective Transfer Learning’, a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in these challenging scenarios without the need to convert non-ideal images into their RGB counterparts. In our methodology, we initially train a comprehensive model on a pristine RGB image dataset. Subsequently, non-ideal images are processed by comparing their feature maps against those from the initial ideal RGB model. This comparison employs the Extended Area Novel Structural Discrepancy Loss (EANSDL), a novel loss function designed to quantify similarities and integrate them into the detection loss. This approach refines the model’s ability to perform object detection across varying conditions through direct feature map correction, encapsulating the essence of Feature Corrective Transfer Learning. Experimental validation on variants of the KITTI dataset demonstrates a significant improvement in mean Average Precision (mAP), resulting in a 3.8-8.1% relative enhancement in detection under non-ideal conditions compared to the baseline model, and a less marginal performance difference within 1.3% of the mAP@[0.5:0.95] achieved under ideal conditions by the standard Faster RCNN algorithm.
暂无评论