检索结果-内蒙古大学图书馆

LucIE: Language-guided local image editing for fashion images

Computational Visual Media 2025年第1期11卷 179-194页

作者： Huanglu Wen Shaodi You Ying Fu School of Computer Science and Technology Beijing Institute of TechnologyBeijingChina Computer Vision Research Group in the Institute of Informatics University of AmsterdamAmsterdamthe Netherlands

Language-guided fashion image editing is challenging,as fashion image editing is local and requires high precision,while natural language cannot provide precise visual information for *** this paper,we propose LucIE,a novel unsupervised language-guided local image editing method for fashion *** adopts and modifies recent text-to-image synthesis network,DF-GAN,as its ***,the synthesis backbone often changes the global structure of the input image,making local image editing *** increase structural consistency between input and edited images,we propose Content-Preserving Fusion Module(CPFM).Different from existing fusion modules,CPFM prevents iterative refinement on visual feature maps and accumulates additive modifications on RGB *** achieves local image editing explicitly with language-guided image segmentation and maskguided image blending while only using image and text *** on the DeepFashion dataset shows that LucIE achieves state-of-the-art *** with previous methods,images generated by LucIE also exhibit fewer *** provide visualizations and perform ablation studies to validate LucIE and the *** also demonstrate and analyze limitations of LucIE,to provide a better understanding of LucIE.

关键词： deep learning language-guided image editing local image editing content preservation fashion images

来源：评论

学校读者我要写书评

暂无评论

TextFormer: A Query-based End-to-end Text Spotter with Mixed Supervision

引用

Machine Intelligence Research 2024年第4期21卷 704-717页

作者： Yukun Zhai Xiaoqiang Zhang Xiameng Qin Sanyuan Zhao Xingping Dong Jianbing Shen School of Computer Science Beijing Institute of TechnologyBeijing 100081China Department of Computer Vision Technology Baidu Inc.Beijing 100193China

End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified *** methods heavily rely on region-of-interest(Rol)operations to extract local features and complex post-processing steps to produce final *** address these limitations,we propose TextFormer,a query-based end-to-end text spotter with a transformer ***,using query embedding per text instance,TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multitask *** allows for mutual training and optimization of classification,segmentation and recognition branches,resulting in deeper feature sharing without sacrificing flexibility or ***,we design an adaptive global aggregation(AGG)module to transfer global features into sequential features for reading arbitrarilyshaped texts,which overcomes the suboptimization problem of Rol ***,potential corpus information is utilized from weak annotations to full labels through mixed supervision,further improving text detection and end-to-end text spotting *** experiments on various bilingual(i.e.,English and Chinese)benchmarks demonstrate the superiority of our *** on the TDA-ReCTS dataset,TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%.

关键词： End-to-end text spotting arbitrarily-shaped texts transformer mixed supervision multitask modeling.

来源：评论

学校读者我要写书评

暂无评论

Multiple palm features extraction method based on vein and palmprint

引用

Journal of Ambient Intelligence and Humanized Computing 2024年第2期15卷 1465-1479页

作者： Li, Wei Yuan, Wei-Qi Computer Vision Group Shenyang University of Technology Shenyang110870 China

Contactless acquisition enables palm identification device to be more easily accepted by people who worry about hygiene problems. The uncertainty of the position between the palm and the device leads to the horizontal rotation, translation, scaling of palm image. This paper proposes a novel feature extraction method which can reflect the geometric features of palmprint and palm vein without being affected by the scaling, rotation and translation. Firstly, the inscribed circle of palm is obtained, and then several radiation segments are made between the circle center and circumference. The wiring direction from the center to the root points of both middle finger and ring finger is as the direction of the first radiation segment. Secondly, the gradient value of pixels in the inscribed circle is calculated by creating template. The feature vector space is established by the relative radius of radiation segment’s centroids. Finally, the palm image database is established based on the application. The feature stability in different size of sub-template and recognition performance in different number of feature line segments are analyzed. In the size of sub-template is equal to three and the number of feature line segments is 60, the equal error rate is less than 0.4% and feature extraction time is 0.0019 s. The experimental results show that the method can extract out stable characteristics whenever the hand image is scaled, rotated or translated. © Springer-Verlag GmbH Germany, part of Springer Nature 2018.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

Ensemble deep learning for high-precision classification of 90 rice seed varieties from hyperspectral images

引用

Journal of Ambient Intelligence and Humanized Computing 2024年第6期15卷 2883-2899页

作者： Taheri, AmirMasoud Ebrahimnezhad, Hossein Sedaaghi, Mohammadhossein Computer Vision Research Laboratory Faculty of Electrical Engineering Sahand University of Technology Tabriz Iran

To develop rice varieties with better nutritional qualities, it is important to classify rice seeds accurately. Hyperspectral imaging can be used to extract spectral information from rice seeds, which can then be used to classify them into different varieties. The challenges of precise classification increase when there are many classes and few training samples. In this paper, we present a novel method for high-precision Hyperspectral Image (HSI) classification of 90 different classes of rice seeds using ensemble deep learning. Our method first employs band selection techniques to select the optimal hyperspectral bands for rice seed classification. Then, a deep neural network is trained with the selected hyperspectral and RGB data from rice seed images to obtain different models for different bands. Finally, an ensemble of deep learning models is employed to classify rice seed images and improve classification accuracy. The proposed method achieves an overall precision ranging from 92.73 to 96.17% despite a large number of classes and low data samples for each class and with only 15 selected hyperspectral bands. This precision is significantly higher than the state-of-the-art classical machine learning methods like random forest, confirming the effectiveness of the proposed method in classifying hyperspectral images of rice seeds. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

关键词： Image classification

来源：评论

学校读者我要写书评

暂无评论

DNACoder: a CNN-LSTM attention-based network for genomic sequence data compression

引用

Neural Computing and Applications 2024年第29期36卷 18363-18376页

作者： Sheena, K.S. Nair, Madhu S. Artificial Intelligence & amp Computer Vision Lab Department of Computer Science Cochin University of Science and Technology Kerala Kochi682022 India

Genomic sequencing has become increasingly prevalent, generating massive amounts of data and facing a significant challenge in long-term storage and transmission. A solution that reduces the storage and transfer requirements without compromising data integrity is needed. The effectiveness of neural networks has already been endorsed in tasks like image and speech compression. Adapting them to recognize the intricate patterns in genomic sequences could help to find more redundancies and reduce storage requirements. The proposed method, called DNACoder, leverages deep learning techniques to achieve significant compression ratios while preserving the essential information in genomic data and offers a high-performance compression for genomic sequences in any data format. The results of the experiments clearly demonstrate the effectiveness of the method and its potential applications in genomic data storage. Our proposed method improves compression by 21.1% on bits per base compared to existing compressors on the benchmarked dataset. By using a deep learning prediction model that is structured as a convolutional layer followed by an attention-based long short-term memory network, we propose a novel lossless and reference-free compression approach (DNACoder), which can also be utilized as a reference-based compressor. The experimental outcome on the tested data illustrates that the advocated compression algorithm’s CNN-LSTM model makes generalizations effectively for genomic sequence data and outperforms the state-of-the-art methods. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

Small object detection in diverse application landscapes: a survey

引用

Multimedia Tools and Applications 2024年第41期83卷 88645-88680页

作者： Iqra Giri, Kaisar J. Javed, Mohammed Department of Computer Science Islamic University of Science & Technology Pulwama India Computer Vision & Biometrics Lab Dept. of IT Indian Institute of Information Technology Allahabad India

The importance of object detection within computer vision, especially in the context of detecting small objects, has notably increased. This thorough survey extensively examines small object detection across various applications, consolidating and outlining the available methodologies. Traditional papers on small object detection have focused on specific domains. However, this survey paper incorporates insights from a multitude of domains, providing a comprehensive understanding of the versatility and applicability of small object detection techniques. This paper sheds light on the key challenges faced and delves into potential solutions to address the challenges, offering insights into viable solutions to enhance small object detection performance, setting it apart from existing literature. The strategies identified in our survey encompass a spectrum of approaches, categorized as transformer-based, CNN, and traditional methods. Also, this paper collates prevalent datasets relevant to small object detection, simplifying access to these resources. Further, it provides a succinct overview of diverse evaluation metrics used for performance assessment in this field, enhancing understanding of the effectiveness and proficiency of these methods. This survey paper not only consolidates established knowledge but also highlights innovative viewpoints, providing a comprehensive and enlightening compilation that contributes to the advancement of small object detection in the field of computer vision. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Transformer-Based Generative Adversarial Networks in computer vision: A Comprehensive Survey

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第10期5卷 4851-4867页

作者： Dubey, Shiv Ram Singh, Satish Kumar Indian Institute of Information Technology Allahabad Computer Vision and Biometrics Laboratory Uttar Pradesh Prayagraj211015 India

Generative adversarial networks (GANs) have been very successful for synthesizing the images in a given dataset. The artificially generated images by GANs are very realistic. The GANs have shown potential usability in several computer vision applications, including image generation, image-to-image translation, and video synthesis. Conventionally, the generator network is the backbone of GANs, which generates the samples, and the discriminator network is used to facilitate the training of the generator network. The generator and discriminator networks are usually a convolutional neural network (CNN). The convolution-based networks exploit the local relationship in a layer, which requires the deep networks to extract the abstract features. However, recently developed transformer networks are able to exploit the global relationship with tremendous performance improvement for several problems in computer vision. Motivated from the success of transformer networks and GANs, recent works have tried to exploit the transformers in GAN framework for the image/video synthesis. This article presents a comprehensive survey on the developments and advancements in GANs utilizing the transformer networks for computer vision applications. The performance comparison for several applications on benchmark datasets is also performed and analyzed. The conducted survey will be very useful to understand the research trends and gaps related with transformer-based GANs and to develop the advanced GAN architectures by exploiting the global and local relationships for different applications. © 2020 IEEE.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Graph-Segmenter:graph transformer with boundary-aware attention for semantic segmentation

引用

Frontiers of computer Science 2024年第5期18卷 97-108页

作者： Zizhang WU Yuanzhu GAN Tianhao XU Fan WANG Computer Vision Perception Department of ZongMu Technology Shanghai 201203China Faculty of Electrical Engineering Information TechnologyPhysicsTechnical University of BraunschweigBraunschweig 38106Germany

Thetransformer-based semantic segmentation approaches,which divide the image into different regions by sliding windows and model the relation inside each window,have achieved outstanding ***,since the relation modeling between windows was not the primary emphasis of previous work,it was not fully *** address this issue,we propose a Graph-Segmenter,including a graph transformer and a boundary-aware attention module,which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one,and for substantial low-cost boundary ***,we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the graph *** introduced boundary-awareattentionmoduleoptimizes theedge information of the target objects by modeling the relationship between the pixel on the object's *** experiments on three widely used semantic segmentation datasets(Cityscapes,ADE-20k and PASCAL Context)demonstrate that our proposed network,a Graph Transformer with Boundary-aware Attention,can achieve state-of-the-art segmentation performance.

关键词： graph transformer graph relation network boundary-aware attention semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Spotlight Control for Real-Time Targeting 32

Spotlight Control for Real-Time Targeting

引用

32nd International Conference in Central Europe on computer Graphics, Visualization and computer vision, WSCG 2024

作者： Heinemann, Mika Benjamin Kernbauer, Thomas Fleck, Philipp Arth, Clemens Institute of Computer Graphics and Vision University of Technology Graz Austria

Off-road heavy machinery such as snow groomers or excavators, often operate in low-light and hazardous environments. In this work, we explore the development of an intelligent camera-spotlight system with automatic and manual control to illuminate points of interest, such as obstacles or individuals at risk. We implement a prototype as proof of concept and integrate our workflow using a standard lighting protocol and a single-board computer. The presented calibration of the camera and spotlight ensures high precision in the desired use cases. In addition to testing the prototype on a real snow groomer, we evaluated its performance in terms of accuracy and repeatability. Overall, we showcase the usability of a commercially available spotlight in the context of spatial augmented reality in heavy machinery applications. © 2024 University of West Bohemia. All rights reserved.

关键词： Augmented reality

来源：评论

学校读者我要写书评

暂无评论

FG-PIH: A Fusion of Fresnelet Transform and Gradient Directional Pattern for Perceptual Image Hashing 2

FG-PIH: A Fusion of Fresnelet Transform and Gradient Directi...

引用

2nd International Conference on Recent Trends in Microelectronics, Automation, Computing, and Communications Systems, ICMACC 2024

作者： Meesala, Pavani Thounaojam, Dalton Meitei Computer Science and Engineering National Institute of Technology Silchar Computer Vision Laboratory Silchar India

ISBN: (纸本)9798350366570

Perceptual image hashing is pivotal in various image processing applications, including image authentication, content-based image retrieval, tampered image detection, and copyright protection. This paper proposes a novel approach for perceptual image hashing by combining the Fresnelet Transform with Gradient Directional Patterns. Using the FG-PIH technique, the proposed method achieves superior robustness against common image processing attacks while maintaining perceptual similarity for near-duplicate images. Experimental results on standard benchmark datasets demonstrate the effectiveness and efficiency of the proposed Fresnelet Transform-based perceptual image hashing scheme. Furthermore, comparative analysis against state-of-the-art methods underscores the competitiveness of our approach in terms of hash quality and computational complexity. © 2024 IEEE.

关键词： Hamming distance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：