检索结果-内蒙古大学图书馆

Sketch2Photo: Synthesizing photo-realistic images from sketches via global contexts

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2023年第PartA期117卷

作者： Liu, Heng Xu, Yao Chen, Feng Anhui Univ Technol Maanshan 243032 Peoples R China Anhui Engn Res Ctr Intelligent Applicat & Secur In Maanshan 243032 Peoples R China

Sketch-to-image synthesis aims to generate realistic images that match the input sketches or edge maps exactly. Most known sketch-to-image synthesis methods use various generative adversarial networks (GANs) that are trained with numerous pairs of sketches and real images. Because of the convolution locality, the low-level layers of the generators in these GANs lack global perception ability, causing feature maps derived from them easily to overlook global cues. Since the global receptive field is crucial for acquiring the non-local structures and features of sketches, the absence of global contexts will impact the generation of high-quality images. Some recent models turn to self-attention to construct global dependencies. However, they are not viable for large feature maps for the quadratic computational complexity concerning the size of feature maps. To address these problems, in this work, we propose Sketch2Photo - a new image synthesis approach that can capture global contexts as well as local features to generate photo-realistic images from weak or partial sketches or edge maps. We employ fast Fourier convolution (FFC) residual blocks to create global receptive fields in the bottom layers of the network and incorporate Swin Transformer block (STB) units to obtain long-range global contexts for large-size feature maps efficiently. We also present an improved spatial attention pooling (ISAP) module to relax the strict alignment requirements between incomplete sketches and generated images. Quantitative and qualitative experiments on multiple public datasets demonstrate the superiority of the proposed approach over many other sketch-to-image synthesis methods. The project code is available at https://***/hengliusky/Skecth2Photo.

关键词： Sketch-based image synthesis Fast Fourier convolution Swin transformer encoder-decoder Global contexts

来源：评论

学校读者我要写书评

暂无评论

Improved Short-term Dense Bottleneck network for efficient scene analysis

引用

COMPUTER VISION AND IMAGE UNDERSTANDING 2023年第1期235卷

作者： Singha, Tanmay Pham, Duc-Son Krishna, Aneesh Curtin Univ Perth WA 6102 Australia

Visual scene understanding mainly depends on pixel-wise classification obtained from a deep convolutional neural network. However, existing semantic segmentation models often face difficulties in real-time appli-cations due to their large network architecture. Although there are real-time semantic segmentation models available, their shallow backbone can degrade the performance considerably. This paper introduces SDBNetV2, a lightweight semantic segmentation model designed to improve real-time performance without increasing computational costs. A key contribution is a novel Short-term Dense Bottleneck (SDB) module in the encoder, which provides varied field-of-views to capture different geometrical objects in a complex scene. Additionally, we propose dense feature refinement and improved semantic aggregation modules at the decoder end to enhance contextualization and object localization. We evaluate the proposed model's performance on several indoor and outdoor datasets in structured and unstructured environments. The results show that SDBNetV2 achieves superior segmentation performance over other real-time models with less than 2 million parameters.

关键词： Semantic segmentation Convolutional neural network Scene understanding Real-time Machine learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Wall segmentation in 2D images using convolutional neural networks

引用

PEERJ COMPUTER SCIENCE 2023年 9卷 e1565页

作者： Bjekic, Mihailo Lazovic, Ana Venkatachalam, K. Bacanin, Nebojsa Zivkovic, Miodrag Kvascev, Goran Nikolic, Bosko Everseen Belgrade Serbia Daon Belgrade Serbia Univ Hradec Kralove Fac Sci Dept Appl Cybernet Hradec Kralove Czech Republic Singidunum Univ Dept Informat & Comp Belgrade Serbia Univ Belgrade Sch Elect Engn Belgrade Serbia

Wall segmentation is a special case of semantic segmentation, and the task is to classify each pixel into one of two classes: wall and no-wall. The segmentation model returns a mask showing where objects like windows and furniture are located, as well as walls. This article proposes the module's structure for semantic segmentation of walls in 2D images, which can effectively address the problem of wall segmentation. The proposed model achieved higher accuracy and faster execution than other solutions. An encoder-decoder architecture of the segmentation module was used. Dilated ResNet50/101 network was used as an encoder, representing ResNet50/101 network in which dilated convolutional layers replaced the last convolutional layers. The ADE20K dataset subset containing only interior images, was used for model training, while only its subset was used for model evaluation. Three different approaches to model training were analyzed in the research. On the validation dataset, the best approach based on the proposed structure with the ResNet101 network resulted in an average accuracy at the pixel level of 92.13% and an intersection over union (IoU) of 72.58%. Moreover, all proposed approaches can be applied to recognize other objects in the image to solve specific tasks.

关键词： Semantic segmentation Wall segmentation encoder-decoder ADE20K PSPNet

来源：评论

学校读者我要写书评

暂无评论

TAN: A Transferable Adversarial Network for DNN-Based UAV SAR Automatic Target Recognition Models

引用

DRONES 2023年第3期7卷 205-205页

作者： Du, Meng Sun, Yuxin Sun, Bing Wu, Zilong Luo, Lan Bi, Daping Du, Mingyang Natl Univ Def Technol Coll Elect Engn Hefei 230037 Peoples R China Sci & Technol Electroopt Informat Secur Control La Tianjin 300308 Peoples R China China Satellite Maritime Tracking & Control Dept Jiangyin 214430 Peoples R China Lanzhou Univ Coll Commun Engn Lanzhou 730030 Peoples R China

Recently, the unmanned aerial vehicle (UAV) synthetic aperture radar (SAR) has become a highly sought-after topic for its wide applications in target recognition, detection, and tracking. However, SAR automatic target recognition (ATR) models based on deep neural networks (DNN) are suffering from adversarial examples. Generally, non-cooperators rarely disclose any SAR-ATR model information, making adversarial attacks challenging. To tackle this issue, we propose a novel attack method called Transferable Adversarial Network (TAN). It can craft highly transferable adversarial examples in real time and attack SAR-ATR models without any prior knowledge, which is of great significance for real-world black-box attacks. The proposed method improves the transferability via a two-player game, in which we simultaneously train two encoder-decoder models: a generator that crafts malicious samples through a one-step forward mapping from original data, and an attenuator that weakens the effectiveness of malicious samples by capturing the most harmful deformations. Particularly, compared to traditional iterative methods, the encoder-decoder model can one-step map original samples to adversarial examples, thus enabling real-time attacks. Experimental results indicate that our approach achieves state-of-the-art transferability with acceptable adversarial perturbations and minimum time costs compared to existing attack methods, making real-time black-box attacks without any prior knowledge a reality.

关键词： unmanned aerial vehicle (UAV) synthetic aperture radar (SAR) automatic target recognition (ATR) deep neural network (DNN) adversarial example transferability encoder-decoder real-time attack

来源：评论

学校读者我要写书评

暂无评论

Enhancing image caption generation through context-aware attention mechanism

引用

HELIYON 2024年第17期10卷 e36272页

作者： Bhuiyan, Ahatesham Hossain, Eftekhar Hoque, Mohammed Moshiul Dewan, M. Ali Akber Chittagong Univ Engn & Technol Dept Elect & Telecommun Engn Chittagong 4349 Bangladesh Chittagong Univ Engn & Technol Dept Comp Sci & Engn Chittagong 4349 Bangladesh Athabasca Univ Fac Sci & Technol Sch Comp & Informat Syst Athabasca AB T9S 3A3 Canada

Image captioning, the process of generating natural language descriptions based on image content, has garnered attention in AI research for its implications in scene understanding and human-computer interaction. While much prior research has focused on caption generation for English, addressing low-resource languages like Bengali presents challenges, particularly in producing coherent captions linking visual objects with corresponding words. This paper proposes a context-aware attention mechanism over semantic attention to accurately diagnose objects for image captioning in Bengali. The proposed architecture consists of an encoder and a decoder block. We chose ResNet-50 over the other pre-trained models for encoding the image features due to its ability to solve the vanishing gradient problem and recognize complex object features. For decoding generated captions, a bidirectional Gated Recurrent Unit (GRU) architecture combined with an attention mechanism captures contextual dependencies in both directions, resulting in more accurate captions. The paper also highlights the challenge of transferring knowledge between domains, especially with culturally specific images. Evaluation of three Bengali benchmark datasets, namely BAN-Cap, , BanglaLekhaImageCaption, , and Bornon, , demonstrates significant performance improvement in METEOR score over existing methods by approximately 30%, 18%, and 45%, respectively. The proposed context-aware, attention-based image captioning system significantly outperforms current state-of-the-art models in Bengali caption generation despite limitations in reference captions on certain datasets.

关键词： Image captioning encoder-decoder Computer vision Cross-domain transfer Natural language processing Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Face From Voice: PyTorch Adaptation of Speech2Face Framework

引用

Procedia Computer Science 2024年 246卷 2892-2901页

作者： Kacper Pietkun Bogna Gondek Marta Browarczyk Faculty of Electronics Telecommunications and Informatics Multimedia Systems Department Gdansk University of Technology Narutowicza 11/12 80-233 Gdańsk Poland

This paper focuses on the problem of face images reconstruction from short audio segments. Built in PyTorch, the speech-to-face pipeline retains the core methodology of the others presented in the previous works, but with the introduction of a few key modifications. Leveraging a comprehensive dataset of internet audio recordings, a deep neural network undergoes training to discern correlations between voice and facial features. Through self-supervised learning, various physical attributes such as age, gender and ethnicity are captured without the need for explicit feature analysis. The evaluation process quantifies the fidelity of reconstructions, measuring the resemblance to the actual facial images solely from audio data through numerical metrics.

关键词： Speech-to-Face encoder-decoder Face generation Audio Spectrogram Transformer Convolutional Neural Networks

来源：评论

学校读者我要写书评

暂无评论

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention 4

Text Detection and Recognition in Natural Scenes Based on TW...

引用

4th International Conference on Video Signal and Image Processing-VSIP

作者： Guo, Baicun North China Elect Power Univ Beijing Peoples R China

ISBN: (纸本)9781450397810

In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.

关键词： encoder-decoder two-dimention attention aggregation

来源：评论

学校读者我要写书评

暂无评论

SoundLip: EnablingWord and Sentence-level Lip Interaction for Smart Devices

引用

PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT 2021年第1期5卷 1-28页

作者： Zhang, Qian Wang, Dong Zhao, Run Yu, Yinggang Shanghai Jiao Tong Univ Shanghai Peoples R China

As a natural and convenient interaction modality, voice input has now become indispensable to smart devices (e.g. mobile phones and smart appliances). However, voice input is strongly constrained by surroundings and may raise privacy leakage in public areas. In this paper, we present SoundLip, an end-to-end interaction system enabling users to interact with smart devices via silent voice input. The key insight is to use inaudible acoustic signals to capture the lip movements of users when they issue commands. Previous works have considered lip reading as a naive classification task and thus can only recognize individual words. In contrast, our proposed system enables lip reading at both word and sentence levels, which are more suitable for daily-life use. We exploit the built-in speakers and microphones of smart devices to emit acoustic signals and listen to their reflections, respectively. In order to better abstract representations from multi-frequency and multi-modality acoustic signals, we elaborate a hierarchical convolutional neural network (HCNN) to serve as the front-end as well as recognize individual word commands. Then, for the sentence-level recognition, we exploit a multi-task encoder-decoder network to get around temporal segmentation and output sentences in an end-to-end way. We evaluate SoundLip on 20 individual words and 70 sentences from 12 participants. Our system achieves an accuracy of 91.2% at word-level and a word error rate of 7.1% at sentence-level in both user-independent and environment-independent settings. Given its innovative solution and promising performance, we believe that SoundLip has made a significant contribution to the advancement of silent voice input technology.

关键词： silent voice input acoustic sensing Hierarchical CNN encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Boundary refinement network with semantic embedding connections for UAV aerial image semantic segmentation

引用

JOURNAL OF ELECTRONIC IMAGING 2023年第6期32卷 063003-063003页

作者： Li, Runzeng Shi, Zaifeng Kong, Fanning Zhao, Xiangyang Luo, Tao Tianjin Univ Sch Microelect Tianjin Peoples R China Tianjin Key Lab Imaging & Sensing Microelect Techn Tianjin Peoples R China Tianjin Univ Coll Intelligence & Comp Tianjin Peoples R China

Deep-learning-based semantic segmentation is the research focus for unmanned aerospace vehicle (UAV) aerial images analysis. However, there are problems in segmenting small and narrow objects and boundary regions, due to the large size differences between objects and the unbalanced class data in aerial images. A network named SEC-BRNet is proposed for the boundary refinement problem. First, the semantic embedding connections and progressive upsampling decoder are used to obtain spatial details for generating fused feature maps, which are then concatenated in decoding process level by level for recovering the boundary details. Second, a multiloss training strategy is developed for data imbalance and boundary roughness problems, including cross-entropy loss, Dice loss, and active boundary loss. After extensive experiments, our network could achieve 84.8% mIoU and 89.04% Boundary IoU on the AeroScapes dataset and achieve 62.81% mIoU and 90.78% Boundary IoU on the Semantic Drone Dataset. The experimental results indicate that the proposed SEC-BRNet performs well in semantic segmentation task for UAV aerial images.

关键词： semantic segmentation UAV aerial image boundary refinement encoder-decoder multiloss training

来源：评论

学校读者我要写书评

暂无评论

Enhanced transformer model for video caption generation

引用

EXPERT SYSTEMS 2023年第0期

作者： Varma, Soumya Peter, J. Dinesh Karunya Inst Technol & Sci Dept CSE Coimbatore Tamil Nadu India

Automatic Video captioning system is a method of describing the content in a video by analysing its visual aspects with regard to space and time and producing a meaningful caption that explains the video. A decade of research in this area has resulted in a steep growth in the quality and appropriateness of the generated caption compared with the expected result. The research has been driven from the very basic method to most advanced transformer method. Machine generated caption of a video must be adhering to many expected standards. For humans, this task may be a trivial one, however its not the same for a machine to analyse the content and generate a semantically coherent description for it. The caption which is generated in a natural language must also adhere to its lexical and syntactical structure. The video captioning process is a culmination of computer vision and natural language processing tasks. Commencing with template based conventional approach, it has surpassed statistical method, traditional deep learning approaches and is now in the trend of using transformers. This work made an extensive study of the literature and has proposed an improved transformer-based architecture for video captioning process. The transformer architecture made use of an encoder and decoder model that has two and three sublayers respectively. Multi-head self attention and cross attention are part of the model which bring about very beneficial results. The decoder is auto-regressive and uses a masked layer to prevent the model from foreseeing future words in the caption. An enhanced encoder-decoder Transformer model with CNN for feature extraction has been used in our work. This model captures the long-range dependencies and temporal relationships more effectively. The model has been evaluated with benchmark datasets and compared with state-of-the-art methods and found to be slightly better in the performance. The performance scores are slightly varying for BLEU, METEOR, ROUGE a

关键词： curriculum learning deep learning encoder-decoder engineering applications neural networks transformer video caption

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：