检索结果-内蒙古大学图书馆

Dense Depth-Guided Generalizable NeRF

IEEE signal processing LETTERS 2023年 30卷 75-79页

作者： Lee, Dongwoo Lee, Kyoung Mu Seoul Natl Univ Dept Elect Engn & Comp Sci ASRI Seoul South Korea Seoul Natl Univ Dept Elect Engn & Comp Sci ASRI Seoul South Korea

neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeRF) frameworks that generalize to unseen scenes on-the-fly by combining multi-view stereo with differentiable volume rendering. These generalizable NeRF methods synthesize the colors of 3D ray points by learning the consistency of image features projected from given nearby views. Since the consistency is computed on the 2D projected image space, it is vulnerable to occlusion and local shape variation by viewing direction. To solve this problem, we present dense depth-guided generalizable NeRF that leverages the depth as the signed distance between the ray point and the object surface of the scene. We first generate the dense depth maps from sparse 3D points of structure from motion (SfM) which is an inevitable step to obtain camera poses. Next, the dense depth maps are exploited as complementary features invariant to the sparsity of nearby views and mask for occlusion handling. Experiments demonstrate that our approach outperforms existing generalizable NeRF methods for widely used real and synthetic datasets.

关键词： Three-dimensional displays Rendering (computer graphics) image color analysis Cameras Feature extraction Training Optimization Deep learning neural radiance field novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

Enhanced synthetic aperture radar image autofocus and classification using 2D SARNet framework

引用

JOURNAL OF APPLIED REMOTE SENSING 2024年第2期18卷

作者： Sakr, Mohamed Saleh, Ahmed AbdElkader, Fathy Amer, Ghada AboElenean, Mohamed Mil Tech Coll Elect Engn Cairo Egypt October 6 Univ Fac Informat & Comp Sci Giza Egypt MUST Univ Fac Engn Dept Elect Engn Giza Egypt

A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image formation is a pivotal step in radar imaging, essential for transforming complex raw radar data into interpretable and utilizable imagery. Nowadays, advancements in SAR sensor design, resulting in very wide swaths, generate a massive volume of data, necessitating extensive processing. Traditional methods of SAR image formation often involve resource-intensive and time-consuming postprocessing. There is a vital need to automate this process in near-real-time, enabling fast responses for various applications, including image classification and object detection. We present an SAR processing pipeline comprising a complex 2D autofocus SARNet, followed by a CNN-based classification model. The complex 2D autofocus SARNet is employed for image formation, utilizing an encoder-decoder architecture, such as U-Net and a modified version of ResU-Net. Meanwhile, the image classification task is accomplished using a CNN-based classification model. This framework allows us to obtain near real-time results, specifically for quick image viewing and scene classification. Several experiments were conducted using real-SAR raw data collected by the European remote sensing satellite to validate the proposed pipeline. The performance evaluation of the processing pipeline is conducted through visual assessment as well as quantitative assessment using standard metrics, such as the structural similarity index and the peak-signal-to-noise ratio. The experimental results demonstrate the processing pipeline's robustness, efficiency, reliability, and responsivity in providing an integrated neural network-based SAR processing pipeline.

关键词： synthetic aperture radar (SAR) deep learning CNN-based model ResU-Net SAR autofocus

来源：评论

学校读者我要写书评

暂无评论

Test Automation for Symbol Recognition on the Map 31

Test Automation for Symbol Recognition on the Map

引用

31st IEEE Conference on signal processing and Communications Applications (SIU)

作者： Turhan, Fatmanur Carkacioglu, Levent Toreyin, Behcet Ugur Aselsan AS Ankara Turkiye Istanbul Tech Univ Bilisim Enstitusu Istanbul Turkiye

ISBN: (纸本)9798350343557

In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficulties were identified regarding the forms of the symbols, the complexity of the maps or the placement of the symbols on the map. Observations about the success or failure of the methods against the difficulties defined according to the experiments are presented. It has been observed that methods involving artificial neural networks are more successful when performing symbol recognition on color maps. The highest result was obtained with Faster RCNN as 91%.

关键词： Symbol Recognition Feature Extraction Support Vector Machines Template Matching Convolutional neural Network Object Detection Software Testing

来源：评论

学校读者我要写书评

暂无评论

Integration of Physics-Based and Data-Driven Models for Hyperspectral image Unmixing: A summary of current methods

引用

IEEE signal processing MAGAZINE 2023年第2期40卷 61-74页

作者： Chen, Jie Zhao, Min Wang, Xiuheng Richard, Cedric Rahardja, Susanto Northwestern Polytech Univ Sch Marine Sci & Technol Xian 710072 Peoples R China Univ Nice Sophia Antipolis Nice France Univ Michigan Ann Arbor MI USA Northwestern Polytech Univ Xian 710072 Peoples R China Univ Cote Azur F-06000 Nice France Singapore Inst Technol Singapore Singapore

Spectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. Nevertheless, because of their limited modeling capabilities, especially when analyzing real scenes with unknown complex physical properties, these methods may not be accurate. On the other hand, data-driven methods using deep learning in particular have developed rapidly in recent years, thanks to their superior capability in modeling complex nonlinear systems. Simply transferring these methods as black boxes to perform unmixing may lead to low interpretability and poor generalization ability. To bring together the best of two worlds, recent research efforts have focused on combining the advantages of both physics-based models and data-driven methods. In this article, we present an overview of recent advances on this topic from various perspectives, including deep neural network (DNN) design, prior capturing, and loss selection. We summarize these methods within a common optimization framework and discuss ways of enhancing our understanding of these methods. The related source codes are made publicly available at http://***/xiuheng-wang/awesome-hyperspectral-image-unmixing.

关键词： Deep learning Analytical models Source coding neural networks Closed box Task analysis Nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

Speech emotion recognition based on multimodal and multiscale feature fusion

引用

signal image AND VIDEO processing 2025年第1期19卷 1-9页

作者： Hu, Huangshui Wei, Jie Sun, Hongyu Wang, Chuhang Tao, Shuo Changchun Univ Technol Coll Comp Sci & Engn Changchun Peoples R China Changchun Normal Univ Coll Comp Sci & Technol Changchun Peoples R China

Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these challenges, this paper introduces a novel network model named Multi-Modal Speech Emotion Recognition Network (MMSERNet). This model leverages the power of multimodal and multiscale feature fusion to significantly enhance the accuracy of speech emotion recognition. MMSERNet is composed of three specialized sub-networks, each dedicated to the extraction of distinct feature types: cepstral coefficients, spectrogram features, and textual features. It integrates audio features derived from Mel-frequency cepstral coefficients and Mel spectrograms with textual features obtained from word vectors, thereby creating a rich, comprehensive representation of emotional content. The fusion of these diverse feature sets facilitates a robust multimodal approach to emotion recognition. Extensive empirical evaluations of the MMSERNet model on benchmark datasets such as IEMOCAP and MELD demonstrate not only significant improvements in recognition accuracy but also an efficient use of model parameters, ensuring scalability and practical applicability.

关键词： Speech emotion recognition Feature extraction Dilated convolutional neural network Feature fusion

来源：评论

学校读者我要写书评

暂无评论

Radar signal processing and Its Impact on Deep Learning-Driven Human Activity Recognition

引用

SENSORS 2025年第3期25卷 724页

作者： Ayaz, Fahad Alhumaily, Basim Hussain, Sajjad Imran, Muhammad Ali Arshad, Kamran Assaleh, Khaled Zoha, Ahmed Univ Glasgow James Watt Sch Engn Glasgow City G12 8QQ England Ajman Univ Coll Engn & Informat Technol Dept Elect & Comp Engn POB 346 Ajman U Arab Emirates Ajman Univ Artificial Intelligence Res Ctr POB 346 Ajman 346 U Arab Emirates

Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neural networks (CNNs) with conventional radar signal processing methods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signal processing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.

关键词： human activity classification radar domain representations deep learning computational cost transfer learning

来源：评论

学校读者我要写书评

暂无评论

Localized Binarization of Document images Based on Suprathreshold stochastic Resonance 4

Localized Binarization of Document Images Based on Suprathre...

引用

4th International Conference on Electronic Information Engineering and Computer Technology, EIECT 2024

作者： Yan, Xiaoyue Mu, Dazhong School of Information Engineering Beijing Institute Of Graphic Communication Beijing China

ISBN: (纸本)9798331528850

When processing text images with traditional binarization methods, the image background noise often causes the results to become blurred or leads to the loss of edge details. To solve this problem, this paper proposes an image binarization method based on stochastic resonance theory. First, we divide the image into sub-blocks and set a binarization threshold based on the statistical properties of the pixels in each sub-block. Next, the image signal is converted into a one-dimensional time series signal using Hilbert scanning. The processed signals are input into a threshold array system, which amplifies the weak edge information in the input signals through the stochastic resonance phenomenon. Subsequently, we performed modulation and inverse scanning on the output signals of the system to generate the binary image for each sub-block. Finally, all sub-block binary images were combined to complete the binarization of the overall image. Experimental results show that the method proposed in this paper can effectively retain the detailed information of document images and significantly outperforms the traditional binarization method regarding image quality. © 2024 IEEE.

关键词： stochastic systems

来源：评论

学校读者我要写书评

暂无评论

A Garbage Classification and Environmental Impact Assessment Model Based on image Recognition and Artificial Intelligence

引用

TRAITEMENT DU signal 2024年第6期41卷 3001-3010页

作者： Lin, Rong Anhui Normal Univ Sch Law Wuhu 241000 Peoples R China

With the rapid urbanization process, waste management has become a significant environmental issue globally. Waste sorting, as an effective method of resource recycling and environmental protection, has gradually become a key solution to the waste pollution problem. Traditional waste classification methods rely on manual labor, which is inefficient and prone to errors, making them inadequate for modern urban waste management. In recent years, image recognition and artificial intelligence (AI)-based methods for waste classification have gained widespread attention, with deep learning techniques, particularly Convolutional neural Networks (CNNs), showing great potential in waste sorting. However, existing research on waste classification models faces challenges such as imperfect network structures, insufficient training data, and poor environmental adaptability, which limit their application in complex environments. This study proposes a waste classification model based on image recognition and AI to enhance classification accuracy and efficiency. First, an improved PCANet and SDenseNet network structure is combined to propose aAnew feature extraction and representation method, enhancing the model's feature learning ability. Secondly, a layered learning strategy, combined with the traditional backpropagation algorithm, is used to optimize the training process and improve learning efficiency. Finally, experimental results demonstrate that the proposed waste classification model significantly outperforms traditional models in classification accuracy and processing capability in various environments, providing aAnew solution for the advancement of waste classification technologies.

关键词： waste classification image recognition Artificial Intelligence (AI) deep learning network structure backpropagation

来源：评论

学校读者我要写书评

暂无评论

Fine-Grained image Generation Network With Radar Range Profiles Using Cross-Modal Visual Supervision

引用

IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES 2024年第2期72卷 1339-1352页

作者： Bao, Jiacheng Li, Da Li, Shiyong Zhao, Guoqiang Sun, Houjun Zhang, Yi Beijing Inst Technol Sch Integrated Circuits & Elect Beijing Key Lab Millimeter Wave & Terahertz Tech Beijing 100081 Peoples R China

Electromagnetic imaging methods mainly utilize converted sampling, dimensional transformation, and coherent processing to obtain spatial images of targets, which often suffer from accuracy and efficiency problems. Deep neural network (DNN)-based high-resolution imaging methods have achieved impressive results in improving resolution and reducing computational costs. However, previous works exploit single modality information from electromagnetic data;thus, the performances are limited. In this article, we propose an electromagnetic image generation network (EMIG-Net), which translates electromagnetic data of multiview 1-D range profiles (1DRPs), directly into bird-view 2-D high-resolution images under cross-modal supervision. We construct an adversarial generative framework with visual images as supervision to significantly improve the imaging accuracy. Moreover, the network structure is carefully designed to optimize computational efficiency. Experiments on self-built synthetic data and experimental data in the anechoic chamber show that our network has the ability to generate high-resolution images, whose visual quality is superior to that of traditional imaging methods and DNN-based methods, while consuming less computational cost. Compared with the backprojection (BP) algorithm, the EMIG-Net gains a significant improvement in entropy (72%), peak signal-to-noise ratio (PSNR;150%), and structural similarity (SSIM;153%). Our work shows the broad prospects of deep learning in radar data representation and high-resolution imaging and provides a path for researching electromagnetic imaging based on learning theory.

关键词： Cross-modal supervision deep neural network (DNN) electromagnetic imaging generative adversarial network (GAN) radar range profile

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Survey of Animal Identification: Exploring Data Sources, AI Advances, Classification Obstacles and the Role of Taxonomy

引用

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS 2024年第1期2024卷

作者： Zhang, Qianqian Ahmed, Khandakar Sharda, Nalin Wang, Hua Victoria Univ Inst Sustainable Ind & Liveable Cities ISILC Footscray Vic 3011 Australia

With the rapid development of entity recognition technology, animal recognition has gradually become essential in modern society, supporting labour-intensive agriculture and animal husbandry tasks. Severe problems such as maintaining biodiversity can also benefit from animal identification technology. However, certain invasive recognition systems have resulted in permanent harm to animals, while noninvasive identification methods also exhibit certain drawbacks. This paper conducts a systematic literature review (SLR), presenting a comprehensive overview of various animal recognition technologies and their applications. Specifically, it examines methodologies such as deep learning, image processing and acoustic analysis used for different animal characteristics and identification purposes. The contribution of machine learning to animal feature extraction is highlighted, emphasising its significance for animal taxonomy and wild species monitoring. Additionally, this review addresses the challenges and limitations of current technologies, including data scarcity, model accuracy and computational requirements, and suggests opportunities for future research to overcome these obstacles.

关键词： animal identification image processing machine learning neural network signal processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：