image aesthetics assessment is commonly treated as a classification or regression task and its performance bottleneck mainly depends on the effective utilization of aesthetic features. Traditional methods tend to use ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
image aesthetics assessment is commonly treated as a classification or regression task and its performance bottleneck mainly depends on the effective utilization of aesthetic features. Traditional methods tend to use only subjective or objective features. Only one type of feature overlooks the inherent unity of these features in aesthetic characteristics. To comprehensively utilize image aesthetic characteristics, we propose an innovative image aesthetics assessment method that integrates subjective and objective features. Specifically, the proposed method comprises two modules: feature extraction module and feature fusion module. In the feature extraction module, a dual-branch neural network is proposed to extract subjective and objective aesthetic features of the image. In the feature fusion module, a feature fusion module based on a deep feedforward neural network is proposed to combine these two types of features and generate the final image aesthetic score. The experimental results indicate that our method gets a high performance in aesthetics assessment.
The practical application of artificial intelligence in the field of natural language processing is becoming more and more extensive, and the technology is changing day by day. With the rapid update and development of...
详细信息
Satellite images have shown a variety of applications in the last two decades which includes health monitoring of crops, disaster management, urban planning etc. With a rapid growth in the development of artificial in...
详细信息
Medical image segmentation plays a pivotal role in computer-aided diagnosis by facilitating the extraction of essential features necessary for disease detection and treatment strategies. The continuous progress in ima...
详细信息
Convolutional neuralnetworks (CNNs) have exhibited great performance in discriminative feature learning for complex visual tasks. Besides discrimination power, interpretability is another important yet under-explored...
详细信息
ISBN:
(纸本)1577358872
Convolutional neuralnetworks (CNNs) have exhibited great performance in discriminative feature learning for complex visual tasks. Besides discrimination power, interpretability is another important yet under-explored property for CNNs. One difficulty in the CNN interpretability is that filters and image classes are entangled. In this paper, we introduce a novel pathway to alleviate the entanglement between filters and image classes. The proposed pathway groups the filters in a late convlayer of CNN into class-specific clusters. Clusters and classes are in a one-to-one relationship. Specifically, we use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix. To enable end-to-end optimization, we develop a novel reparameterization trick for handling the non-differentiable Bernoulli sampling. We evaluate the effectiveness of our method on ten widely used network architectures (including nine CNNs and a ViT) and five benchmark datasets. Experimental results have demonstrated that our method PICNN (the combination of standard CNNs with our proposed pathway) exhibits greater interpretability than standard CNNs while achieving higher or comparable discrimination power.
To address the challenges present in the increasingly popular field of drone aerial photography for target detection tasks, this paper aims to refer to the design paradigm of RT-DETR, and make adaptive improvements ta...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
To address the challenges present in the increasingly popular field of drone aerial photography for target detection tasks, this paper aims to refer to the design paradigm of RT-DETR, and make adaptive improvements targeting issues in drone aerial images such as the high proportion of small targets, complex backgrounds, and the performance impact caused by threshold filtering and post-processing operations in traditional CNN-based target detection models. Firstly, by redesigning the feature fusion part of the model, a Global Feature Aggregation module is proposed on the basis of the traditional PAN, to enhance the feature fusion capability and reduce feature loss. Secondly, by incorporating the Bi-Level Routing Attention from BiFormer to construct the encoder structure of DETR, the model employs a dynamic sparse attention mechanism to focus on small targets in the images, thus strengthening the detection capability for minute targets. We propose an end-to-end drone aerial image target detection model based on the Transformer architecture, RT-DETR-UAVs. Through extensive experiments, the effectiveness and precision of the model are demonstrated. Furthermore, this model is applied to the task of aerial solar panel detection, discussing the diversity and potential of drone technology in the solar energy field, especially in terms of improving efficiency, reducing costs, and supporting sustainable development.
Text image super-resolution is a unique and vital task aimed at enhancing the readability of text images to humans. It frequently serves as a pre-processing step in scene text recognition. Nevertheless, due to the com...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Text image super-resolution is a unique and vital task aimed at enhancing the readability of text images to humans. It frequently serves as a pre-processing step in scene text recognition. Nevertheless, due to the complex degradation in natural scenes, recovering high-resolution texts from low-resolution inputs is ambiguous and challenging. Predominantly, existing methods employ deep neuralnetworks trained with pixel-wise losses, tailored for natural image reconstruction, yet neglecting the unique characteristics intrinsic to text. While a limited number of studies proposed content-based losses, these primarily concentrate on the accuracy of text recognizers, resulting in reconstructed images that may still be ambiguous to humans. Moreover, these approaches typically exhibit inadequate generalizability when dealing with cross-language cases. To this end, we present TATSR, a Text-Aware Text Super-Resolution framework, which effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss. The CCTB, consisting of two orthogonal transformers, is designed to extract both vertical and horizontal content information from text images. The CP Loss supervises text reconstruction by integrating content semantics through multi-scale text recognition features, thereby embedding content awareness effectively into the framework. Extensive experiments on different language datasets demonstrate that TATSR outperforms state-of-the-art methods in terms of both recognition accuracy and human perception. Codes are released at https://***/Imalne/***.
Spiking neuralnetworks (SNNs), recognized for their dynamic and event-driven capabilities, offer a viable, energy-efficient alternative to conventional artificialneuralnetworks (ANNs), emulating aspects of the huma...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
Spiking neuralnetworks (SNNs), recognized for their dynamic and event-driven capabilities, offer a viable, energy-efficient alternative to conventional artificialneuralnetworks (ANNs), emulating aspects of the human brain's processing power. This paper provides a comparative study of deterministic SNNs (DSNNs) and probabilistic SNNs (PSNNs), examining their ability to interpret data from event-cameras, which activate only upon significant changes in pixel brightness. By leveraging SNNs, we can directly process sporadic, asynchronous, event-based data, thus fully utilizing the high-temporal resolution, extensive dynamic range, and robustness to motion blur offered by event-cameras. Our investigation aims to deepen the understanding of the operational strengths and weaknesses of these SNN architectures, particularly in detecting and precisely tracking visual events-a critical function for real-time applications such as autonomous vehicle navigation. We created and employed a dataset obtained from a DVXplorer event-camera for this evaluation.
Memristive devices are promising candidates for the role of analog synapses in the hardware implementation of energy-efficient artificialneuralnetworks (ANNs). However, variations in the resistance of memristive dev...
详细信息
artificial Intelligence (AI) is only as good as its training data. Large training sets with variants on the same classifier improve AI performance and accuracy, especially in imageprocessing systems. Obtaining these ...
详细信息
ISBN:
(纸本)9798350362923;9798350362916
artificial Intelligence (AI) is only as good as its training data. Large training sets with variants on the same classifier improve AI performance and accuracy, especially in imageprocessing systems. Obtaining these large amounts of training data required for training AI and deep neuralnetworks, is labor-intensive, expensive and in some cases not possible. This article explores creating a synthetic image dataset of basic electronic components by using the Blender 3D software package to automatically generate large amounts of synthetic images and image augmentation to expand the synthetic dataset. A YOLOv5 classifier model was trained on the resulting synthetic data, and the performance of the model was evaluated using a set of real-world and synthetic testing images. The results show that good-quality synthetic data that accurately represent real-world electronic components can be used to successfully train a deep learning classifier, leading to cost and time savings in the data acquisition process. However, it also shows that synthetic data that does not accurately represent real-world electronic components is of no use and will reduce the overall performance of the classifier.
暂无评论