检索结果-内蒙古大学图书馆

28th International Conference on Neural Information Processing

作者： Chen, Cheng Gu, Xiaodong Fudan Univ Dept Elect Engn Shanghai 200433 Peoples R China

ISBN: (纸本)9783030923099;9783030923105

This paper addresses the problem of visual dialog, which aims to answer multi-round questions based on the dialog history and image content. This is a challenging task because a question may be answered in relations to any previous dialog and visual clues in image. Existing methods mainly focus on discriminative setting, which design various attention mechanisms to model interaction between answer candidates and multi-modal context. Despite having impressive results with attention based model for visual dialog, a universal encoder-decoder for both answer understanding and generation remains challenging. In this paper, we propose UED, a unified framework that exploits answer candidates to jointly train discriminative and generative tasks. UED is unified in that (1) it fully exploiting the interaction between different modalities to support answer ranking and generation in a single transformer based model, and (2) it uses the answers as anchors to facilitate both two settings. We evaluate the proposed UED on the VisDial dataset, where our model outperforms the state-of-the-art.

关键词： Visual dialog Cross modal learning encoder decoder network

来源：评论

学校读者我要写书评

暂无评论

Real-Time 3D Face Alignment Using an encoder-decoder network With an Efficient Deconvolution Layer

引用

IEEE SIGNAL PROCESSING LETTERS 2020年 27卷 1944-1948页

作者： Ning, Xin Duan, Pengfei Li, Weijun Zhang, Shaolin Chinese Acad Sci Inst Semicond Beijing 100083 Peoples R China Beijing Key Lab Semicond Neural Network Intellige Beijing 100083 Peoples R China Wave Grp Cognit Comp Technol Joint Lab Beijing 102208 Peoples R China Shenzhen Wave Kingdom Co Ltd Shenzhen 518102 Peoples R China

In the field of 3D face alignment, most researchers have focused on improving the prediction accuracy of algorithms and ignored the portability for practical applications. To this end, this study presents a real-time 3D face-alignment method that uses an encoder-decoder network with an efficient deconvolution layer. The fusion of the encoding and decoding feature adds more abundant features to this network. An efficient deconvolution layer at the decoding stage applies the L1 norm to select useful features and generate abundant ones through linear operations. Experimental results using the standard AFLW2000-3D and AFLW-LFPA datasets show that our algorithm has low prediction errors with real-time applicability.

关键词： Three-dimensional displays Decoding Face recognition Deconvolution Faces Encoding Real-time systems 3D face alignment deconvolution encoder decoder network real time application

来源：评论

学校读者我要写书评

暂无评论

A novel hybrid attention gate based on vision transformer for the detection of surface defects

引用

SIGNAL IMAGE AND VIDEO PROCESSING 2024年第10期18卷 6835-6851页

作者： Uzen, Hueseyin Turkoglu, Muammer Ozturk, Dursun Hanbay, Davut Bingol Univ Dept Comp Engn TR-12000 Bingol Turkiye Samsun Univ Dept Software Engn TR-55000 Samsun Turkiye Bingol Univ Dept Elect & Elect Engn TR-12000 Bingol Turkiye Inonu Univ Dept Comp Engn TR-44000 Malatya Turkiye

Many advanced models have been proposed for automatic surface defect inspection. Although CNN-based methods have achieved superior performance among these models, it is limited to extracting global semantic details due to the locality of the convolution operation. In addition, global semantic details can achieve high success for detecting surface defects. Recently, inspired by the success of Transformer, which has powerful abilities to model global semantic details with global self-attention mechanisms, some researchers have started to apply Transformer-based methods in many computer-vision challenges. However, as many researchers notice, transformers lose spatial details while extracting semantic features. To alleviate these problems, in this paper, a transformer-based Hybrid Attention Gate (HAG) model is proposed to extract both global semantic features and spatial features. The HAG model consists of Transformer (Trans), channel Squeeze-spatial Excitation (sSE), and merge process. The Trans model extracts global semantic features and the sSE extracts spatial features. The merge process which consists of different versions such as concat, add, max, and mul allows these two different models to be combined effectively. Finally, four versions based on HAG-Feature Fusion network (HAG-FFN) were developed using the proposed HAG model for the detection of surface defects. The four different datasets were used to test the performance of the proposed HAG-FFN versions. In the experimental studies, the proposed model produced 83.83%, 79.34%, 76.53%, and 81.78% mIoU scores for MT, MVTec-Texture, DAGM, and AITEX datasets. These results show that the proposed HAGmax-FFN model provided better performance than the state-of-the-art models.

关键词： Defects detection Vision transformers Squeeze and excitation encoder decoder network Convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

A novel hybrid loss-based encoder–decoder model for accurate Pulmonary Embolism segmentation

引用

International Journal of Information Technology (Singapore) 2025年第3期17卷 1663-1677页

作者： Vadhera, Renu Sharma, Meghna Department of Computer Science and Engineering The NorthCap University Haryana Gurugram 122017 India

Pulmonary embolism (PE) is diagnosed early and accurately to ensure minimal danger at an advanced stage. This approach extends the advanced techniques for preprocessing, including normalization, slice filtering and resizing. It combines an architecture with skip connections and upsampling toward capturing that extensive detailed contextual information. The loss function used in the model is a combination of SSIM and Dice loss to balance consistency with regard to structural detail and optimization of pixel overlap. It is estimated on a PE challenge dataset (CT scans), where the mean Dice coefficient reached 0.9407, Jaccard similarity 0.9286, sensitivity 0.9324. This methodology outperforms the state of art models. All this shows that the model has a good potential for being applied in clinical practice to automate PE detection. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2025.

关键词： CNN based model Deep learning encoder decoder network Hybrid loss Pulmonary embolism Segmentation UNET

来源：评论

学校读者我要写书评

暂无评论

Hand Gesture Sequence Recognition using Inertial Motion Units (IMUs) 4

Hand Gesture Sequence Recognition using Inertial Motion Unit...

引用

4th IAPR Asian Conference on Pattern Recognition (ACPR)

作者： Kavarthapu, Dilip Chakravarthy Mitra, Kaushik Indian Inst Technol Madras Dept Elect Engn Madras Tamil Nadu India

ISBN: (纸本)9781538633540

Unlike approaches that classify single gesture at a time, we propose a deep learning based technique that can classify multiple gestures in one shot. This is specially suitable for applications that involves seamless gesture sequences such as sign language recognition, touch-less car assistance systems and gaming systems. We propose a Long Short Term Memory(LSTM) based deep network on the lines of an encoder-decoder architecture that classifies gesture sequence accurately in one go. We also show an empirical training strategy for our architecture which can achieve good results even with limited amount of collected data. Results from the experiments performed on labelled datasets from Inertial Motion Units (IMU) proves the efficiency and usefulness of the proposed method.

关键词： gesture recognition IMU LSTM encoder decoder network deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：