Once only a few-shot annotated samples are available, the performance of learning-based object detection would be heavily dropped. Many few-shot object detection ( FSOD) methods have been proposed to tackle this issue...
详细信息
ISBN:
(纸本)9798350353006
Once only a few-shot annotated samples are available, the performance of learning-based object detection would be heavily dropped. Many few-shot object detection ( FSOD) methods have been proposed to tackle this issue by adopting image-level augmentations in linear manners. Nevertheless, those handcrafted enhancements often suffer from limited diversity and lack of semantic awareness, resulting in unsatisfactory performance. To this end, we propose a Semantic-guided Non-linear Instance-level data Augmentation method (SNIDA) for FSOD by decoupling the foreground and background to increase their diversities respectively. We design a semantic awareness enhancement strategy to separate objects from backgrounds. Concretely, masks of instances are extracted by an unsupervised semantic segmentation module. Then the diversity of samples would be improved by fusing instances into different backgrounds. Considering the shortcomings of augmenting images in a limited transformation space of existing traditional data augmentation methods, we introduce an object reconstruction enhancement module. The aim of this module is to generate sufficient diversity and non-linear training data at the instance level through a semantic-guided masked autoencoder. In this way, the potential of data can be fully exploited in various object detection scenarios. Extensive experiments on PASCAL VOC and MS-COCO demonstrate that the proposed method outperforms baselines by a large margin and achieves new state-of-the-art results under different shot settings.
Face models are widely used in image processing and other domains. The input data to create a 3D face model ranges from accurate laser scans to simple 2D RGB photographs. These input data types are typically deficient...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Face models are widely used in image processing and other domains. The input data to create a 3D face model ranges from accurate laser scans to simple 2D RGB photographs. These input data types are typically deficient either due to missing regions, or because they are underconstrained. As a result, reconstruction methods include embedded priors encoding the valid domain of faces. System designers must choose a source of input data and then choose a reconstruction method to obtain a usable 3D face. If a particular application domain requires accuracy X, which kinds of input data are suitable? Does the input data need to be 3D, or will 2D data suffice? This paper takes a step toward answering these questions using synthetic data. A ground truth dataset is used to analyze accuracy obtainable from 2D landmarks, 3D landmarks, low quality 3D, high quality 3D, texture color, normals, dense 2D imagedata, and when regions of the face are missing. Since the data is synthetic it can be analyzed both with and without measurement error. This idealized synthetic analysis is then compared to real results from several methods for constructing 3D faces from 2D photographs. The experimental results suggest that accuracy is severely limited when only 2D raw input data exists.
Realistic geo-referenced electrical distribution grid (DG) models are of great importance for power system analysis and resilience studies. However, DG data are usually not publicly available. In this study, we develo...
详细信息
Transformer-based methods have improved the quality of hyperspectral images (HSIs) reconstructed from RGB by effectively capturing their remote relationships. The self-attention mechanisms in existing Transformer mode...
详细信息
High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time-consuming. Therefore, the challenging task of r...
详细信息
Archaeological research often relies on the meticulous reconstruction of historical structures and artifacts. However, this process is frequently hindered by the absence of detailed 3D data and the limited availabilit...
详细信息
Style transfer technology has been widely applied in the field of image processing. Most of the current style transfer methods obtain style information from a single mode. This will cause the model to lose or incomple...
详细信息
Localizing Ground Penetrating Radar (LGPR) offers the distinct advantage of being unaffected by weather and light conditions changes. As a novel auxiliary driving localizing system, LGPR enhances the robustness of the...
详细信息
Wearable sensors are miniature and affordable devices used for monitoring human motion in daily life. data-driven models applied to wearable sensor data can enhance the accuracy of movement analysis outside of control...
详细信息
ISBN:
(纸本)9783031510229;9783031510236
Wearable sensors are miniature and affordable devices used for monitoring human motion in daily life. data-driven models applied to wearable sensor data can enhance the accuracy of movement analysis outside of controlled settings. However, obtaining a large and representative database for training these models is challenging due to the specialised motion laboratories and expensive equipment required. To address this limitation, this study proposes a data augmentation approach using generative deep learning to enhance biomechanical datasets. A novel conditional generative adversarial network (GAN) was developed to synthesise biomechanical data during gait. The GAN takes into account the subject's anthropometric measures to generate data that represents specific body types as well as information about the gait cycle for reconstruction back into the time domain. The proposed model was evaluated for generating biomechanical data of unseen subjects and fine-tuning the model with small percentages (1%, 2% and 5%) of the test dataset. Researchers and practitioners can overcome the limitations of obtaining large training datasets from human participants by synthesising realistic and diverse synthetic data. This paper outlines the methodology and experimental setup for developing and evaluating the GAN and discusses its potential impact on the field of biomechanics and human motion analysis.
Radiology workflow automation requires knowledge of exam contents in an image series such as anatomy region, injected contrast phase, presence of metals, so that appropriate post-processing steps and analysis can be i...
详细信息
ISBN:
(纸本)9781510671577;9781510671560
Radiology workflow automation requires knowledge of exam contents in an image series such as anatomy region, injected contrast phase, presence of metals, so that appropriate post-processing steps and analysis can be invoked automatically. This paper investigates the applicability of DL to the task of classifying an entire image series into one of fourteen common exam types. A total of 2300 independent computed tomography (CT) image series, each manually labeled for its exam category by clinical experts, was used to train DL models. An additional 593 series were labeled and used as an independent test set. Each CT image series containing a 3D volume acquisition is converted to a special 2D multiplanar-reconstruction (MPR) image. DL based classifier was trained to classify the image series based on this 2D representation, which could be an AP view, a Lateral view or both. Different convolutional neural network architectures with varying block depths were compared. Global average pooling (GAP) layer was used in the final classification block and the impact of input view was studied. The impact of depth of feature extraction layer, input image type, data augmentation techniques and learning rates were studied. The best single class prediction accuracy achieved was 97%. The top-two classes classification accuracy reached > 99%. This method avoids the cost of inferencing each image in a 3D series but still provides very high classification accuracy.
暂无评论