检索结果-内蒙古大学图书馆

A Multiple-Integration encoder for Multi-Turn Text-to-SQL Semantic Parsing

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2021年 29卷 1503-1513页

作者： Wang, Run-Ze Ling, Zhen-Hua Zhou, Jing-Bo Hu, Yu Univ Sci & Technol China Natl Engn Lab Speech & Language Informat Hefei 230027 Peoples R China Baidu Res Business Intelligence Lab Beijing 100084 Peoples R China Univ Sci & Technol China Hefei 230027 Peoples R China iFLYTEK Res Hefei 230088 Peoples R China

This paper studies multi-turn text-to-SQL generation, which is a new but important task in semantic parsing. In order to deal with its two challenges, i.e., multi-turn interaction and cross-domain evaluation, this paper proposes a multiple-integration encoder, which derives the vector representations of user utterances and database schemas using three custom-designed modules for information integration. First, an utterance representation enhancing module is built to integrate the information of history utterances into the representation of each token in current utterance by attentive selection. Second, a schema discrepancy enhancing module is designed to integrate previous predicted SQL query into the representation of schema items. Third, a latent schema linking module is employed to integrate schema information into utterance representations for better dealing with unseen database schemas. These three modules are all implemented based on a lightweight multi-head attention mechanism, which reduces the number of parameters in conventional multi-head attention. Experimental results on the SParC dataset show that our method achieved better accuracy of multi-turn text-to-SQL generation than the most advanced benchmarks. Further ablations studies and analysis also demonstrate the effectiveness of the three modules designed for information integration in the encoder.

关键词： Task analysis Structured Query Language Decoding Databases Semantics History Bit error rate Text-to-SQL cross-domain multi-turn encoder-decoder lightweight multi-head attention

来源：评论

学校读者我要写书评

暂无评论

Kidney and Kidney Tumour Segmentation from 3D CT Scan using DeepLabv3+

Kidney and Kidney Tumour Segmentation from 3D CT Scan using ...

引用

2024 Region 10 Symposium

作者： Jariwala, Taashna A. Mehta, Pranavi C. Mehta, Mayuri A. Joshi, Vivek C. Sarvajanik Coll Engn & Technol Dept Artificial Intelligence & Data Sci Surat India Sarvajanik Coll Engn & Technol Dept Comp Engn Surat India RNG Patel Inst Technol Dept Informat Technol Bardoli India

ISBN: (纸本)9798350364866;9798350364873

Segmenting kidney and kidney tumour from CT scan is crucial in combating challenges in the early detection of kidney cancer. Several segmentation methods are available to segment kidney and kidney tumour from 3D CT scan. However, these methods pose several drawbacks, including dependency on pixel-wise classification, limited generalisation, and manual annotation requirements. Hence, this paper introduces a novel kidney and kidney tumour segmentation approach employing encoder-decoder-based architecture. The proposed segmentation approach is assessed against two encoder-decoder-based architectures, namely U-Net and DeepLabv3+. The proposed approach precisely identifies kidney and kidney tumour in a 3D CT scan. Its performance is analysed using the 2023 Kidney and Kidney Tumour Segmentation Challenge (KiTS23) dataset. The evaluation metrics such as dice coefficient and Intersection over Union (IoU) are used to assess the performance. Our results on the KiTS23 dataset show that DeepLabv3+ outperforms U-Net. Thus, the paper discusses the DeepLabv3+ approach in detail. DeepLabv3+ boasts an average improvement of 0.82% in dice coefficient, 1.60% in IoU, 39.28% in loss during training, and 0.94% in dice coefficient, 1.82% in IoU, 44.88% in loss during validation over U-Net.

关键词： 3D CT scan DeepLabv3+ deep learning disease detection encoder-decoder kidney tumour kidney cancer KiTS23 medical image segmentation U-Net

来源：评论

学校读者我要写书评

暂无评论

VIG-UNET: VISION GRAPH NEURAL NETWORKS FOR MEDICAL IMAGE SEGMENTATION 20

VIG-UNET: VISION GRAPH NEURAL NETWORKS FOR MEDICAL IMAGE SEG...

引用

20th IEEE International Symposium on Biomedical Imaging (ISBI)

作者： Jiang, Juntao Chen, Xiyu Tian, Guanzhong Liu, Yong Zhejiang Univ Coll Control Sci & Engn Hangzhou Peoples R China Zhejiang Univ Polytech Inst Hangzhou Peoples R China Zhejiang Univ Ningbo Innovat Ctr Ningbo Peoples R China

ISBN: (纸本)9781665473583

Deep neural networks have been widely used in medical image analysis and medical image segmentation is one of the most important tasks. U-shaped neural networks with encoder-decoder are prevailing and have succeeded greatly in various segmentation tasks. While CNNs treat an image as a grid of pixels in Euclidean space and Transformers recognize an image as a sequence of patches, graph-based representation is more generalized and can construct connections for each part of an image. In this paper, we propose a novel ViG-UNet, a graph neural network-based U-shaped architecture with the encoder, the decoder, the bottleneck, and skip connections. The downsampling and upsampling modules are also carefully designed. The experimental results on ISIC 2016, ISIC 2017 and Kvasir-SEG datasets demonstrate that our proposed architecture outperforms most existing classic and state-of-the-art U-shaped networks.

关键词： Medical image segmentation ViG-UNet Graph neural networks encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Generating Textual Description Using Modified Beam Search 29th

Generating Textual Description Using Modified Beam Search

引用

29th International Conference on Neural Information Processing

作者： Rai, Divyansh Agarwal, Arpit Kumar, Bagesh Vyas, O. P. Khan, Suhaib Shourya, S. Indian Inst Informat Technol Allahabad Prayagraj India

ISBN: (纸本)9789819916412;9789819916429

Generating textual descriptions of images by describing them in words is a fundamental problem that connects computer vision and natural language processing. A single image may include several entities, their orientations, appearance, and position in a scene as well as their complex spatial interactions, thus leading to a lot of possible captions for an image. Search algorithm of Beam Search has been employed for the task of sentence for the last couple of decades, although it returns around the similar captions with minor changes of wordings. We came across another search strategy, Diverse M-Best which uses M (M denotes the number of independent, diverse beam searches) beam searches from diverse starting statements and keeps the best output from each beam search, and removes the rest of (B-1) captions. This method would mostly lead us to many possible diverse generated sequences, but running Beam Search M several times would be computationally expensive. With the above stated works in vision, we have devised and implemented a novel algorithm, Modified Beam Search (MBS), for generation of Diverse and better captions, with an increase in the computational complexity as compared to the Beam Search. We obtained improvements on BLEU-3 and BLEU-4 scores by 1-3% over the top-2 predicted captions from the original beam search captions.

关键词： CNN encoder-decoder LSTM Modified Beam Search Search Algorithms

来源：评论

学校读者我要写书评

暂无评论

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2022年 31卷 3211-3223页

作者： Li, Jiangyun Zha, Sen Chen, Chen Ding, Meng Zhang, Tianxiang Yu, Hong Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Minist Educ Key Lab Knowledge Automat Ind Proc Beijing 100083 Peoples R China Univ Sci & Technol Beijing Shunde Grad Sch Foshan 528000 Peoples R China Univ Cent Florida Ctr Res Comp Vis Orlando FL 32816 USA Scoop Med Houston TX 77007 USA

The encoder-decoder architecture is widely used as a lightweight semantic segmentation network. However, it struggles with a limited performance compared to a well-designed Dilated-FCN model for two major problems. First, commonly used upsampling methods in the decoder such as interpolation and deconvolution suffer from a local receptive field, unable to encode global contexts. Second, low-level features may bring noises to the network decoder through skip connections for the inadequacy of semantic concepts in early encoder layers. To tackle these challenges, a Global Enhancement Method is proposed to aggregate global information from high-level feature maps and adaptively distribute them to different decoder layers, alleviating the shortage of global contexts in the upsampling process. Besides, aLocal Refinement Module is developed by utilizing the decoder features as the semantic guidance to refine the noisy encoder features before the fusion of these two (the decoder features and the encoder features). Then, the two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed. Extensive experiments on PASCAL Context, ADE20K, and PASCAL VOC 2012 datasets have demonstrated the effectiveness of the proposed approach. In particular, with a vanilla ResNet-101 backbone, AGLN achieves the state-of-the-art result (56.23% mean IOU) on the PASCAL Context dataset. The code is available at https://***/zhasen1996/AGLN.

关键词： Semantics Decoding Image segmentation Interpolation Convolution Aggregates Context modeling Semantic segmentation encoder-decoder global enhancement local refinement context fusion

来源：评论

学校读者我要写书评

暂无评论

Automatic concrete crack segmentation model based on transformer

引用

AUTOMATION IN CONSTRUCTION 2022年 139卷

作者： Wang, Wenjun Su, Chao Hohai Univ Coll Water Conservancy & Hydropower Engn Nanjing 210098 Peoples R China

Routine visual inspection of concrete structures is essential to maintain safe conditions. Therefore, studies of concrete crack segmentation using deep learning methods have been extensively conducted in recent years. However, insufficient performance remains a major challenge in diverse field-inspection scenarios. In this study, a novel SegCrack model for pixel-level crack segmentation is therefore proposed using a hierarchically structured Transformer encoder to output multiscale features and a top-down pathway with lateral connections to progressively up-sample and fuse features from the deepest layer of the encoder. Furthermore, an online hard example mining strategy was adopted to strengthen the detection of hard samples and improve the model performance. The effect of dataset size on the segmentation performance was then investigated. The results indicated that SegCrack achieved a precision, recall, F1 score, and mean intersection over union of 96.66%, 95.46%, 96.05%, and 92.63%, respectively, using the test set.

关键词： Concrete crack Pixel-wise segmentation Visual transformer Self-attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Development and Assessment of Water-Level Prediction Models for Small Reservoirs Using a Deep Learning Algorithm

引用

WATER 2022年第1期14卷 55-55页

作者： Kusudo, Tsumugu Yamamoto, Atsushi Kimura, Masaomi Matsuno, Yutaka Kindai Univ Fac Agr Dept Environm Management 3327-204 Nakamachi Nara 6318505 Japan

In this study, we aimed to develop and assess a hydrological model using a deep learning algorithm for improved water management. Single-output long short-term memory (LSTM SO) and encoder-decoder long short-term memory (LSTM ED) models were developed, and their performances were compared using different input variables. We used water-level and rainfall data from 2018 to 2020 in the Takayama Reservoir (Nara Prefecture, Japan) to train, test, and assess both models. The root-mean-squared error and Nash-Sutcliffe efficiency were estimated to compare the model performances. The results showed that the LSTM ED model had better accuracy. Analysis of water levels and water-level changes presented better results than the analysis of water levels. However, the accuracy of the model was significantly lower when predicting water levels outside the range of the training datasets. Within this range, the developed model could be used for water management to reduce the risk of downstream flooding, while ensuring sufficient water storage for irrigation, because of its ability to determine an appropriate amount of water for release from the reservoir before rainfall events.

关键词： reservoir-water level long short-term memory encoder-decoder flood control irrigation water-management tool

来源：评论

学校读者我要写书评

暂无评论

DUPnet: Water Body Segmentation with Dense Block and Multi-Scale Spatial Pyramid Pooling for Remote Sensing Images

引用

REMOTE SENSING 2022年第21期14卷 5567页

作者： Liu, Zhiheng Chen, Xuemei Zhou, Suiping Yu, Hang Guo, Jianhua Liu, Yanming Xidian Univ Sch Aerosp Sci & Technol Xian 710026 Peoples R China Tech Univ Munich TUM Dept Aerosp & Geodesy Data Sci Earth Observat D-80333 Munich Germany

Water body segmentation is an important tool for the hydrological monitoring of the Earth. With the rapid development of convolutional neural networks, semantic segmentation techniques have been used on remote sensing images to extract water bodies. However, some difficulties need to be overcome to achieve good results in water body segmentation, such as complex background, huge scale, water connectivity, and rough edges. In this study, a water body segmentation model (DUPnet) with dense connectivity and multi-scale pyramidal pools is proposed to rapidly and accurately extract water bodies from Gaofen satellite and Landsat 8 OLI (Operational Land Imager) images. The proposed method includes three parts: (1) a multi-scale spatial pyramid pooling module (MSPP) is introduced to combine shallow and deep features for small water bodies and to compensate for the feature loss caused by the sampling process;(2) dense blocks are used to extract more spatial features to DUPnet's backbone, increasing feature propagation and reuse;(3) a regression loss function is proposed to train the network to deal with the unbalanced dataset caused by small water bodies. The experimental results show that the F1, MIoU, and FWIoU of DUPnet on the 2020 Gaofen dataset are 97.67%, 88.17%, and 93.52%, respectively, and on the Landsat River dataset, they are 96.52%, 84.72%, 91.77%, respectively.

关键词： encoder-decoder multi-scale spatial pyramid pooling dense connection regression loss remote sensing water body semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

ASCNet: 3D object detection from point cloud based on adaptive spatial context features q

引用

NEUROCOMPUTING 2022年第0期475卷 89-101页

作者： Tong, Guofeng Peng, Hao Shao, Yuyuan Yin, Qijun Li, Zheng Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China

This paper presents a novel two-stage 3D point cloud object detector named ASCNet for autonomous driving. Most current works project 3D point clouds to 2D space, whereas the quantization loss in the transformation is inevitable. A Pillar-wise Spatial Context Feature Encoding (PSCFE) module is proposed in the paper to drive the learning of discriminative features and reduce the detailed information loss. The inhomogeneity that existed in 3D object detection from the point clouds, such as the inconsistent number of points in the pillars, the diverse size of Regions of Interest (RoI), should be treated wisely due to the sparsity and the individual specificity. We introduce a length-adaptive RNN-based module to solve the inhomogeneity. A novel backbone combining encoder-decoder and shortcut connection is designed in the paper to learn the multi-scale features for 3D object detection. Additionally, we utilize multiple RoI heads and class-wise NMS to deal with the class imbalance in scenes. Extensive experiments on the KITTI dataset demonstrate that our algorithm achieves competitive performance in 3D bounding box detection and BEV detection. (c) 2021 Elsevier B.V. All rights reserved.

关键词： 3D object detection Point clouds Pillar-wise Inhomogeneity encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Detailed feature extraction network-based fine-grained face segmentation

引用

KNOWLEDGE-BASED SYSTEMS 2022年第0期250卷

作者： Umirzakova, Sabina Whangbo, Taeg Keun Gachon Univ Dept IT Convergence Engn Seongnam South Korea Gachon Univ Dept Comp Sci Seongnam South Korea

Face parsing refers to the labeling of each facial component in a face image and has been employed in facial stimulation, expression recognition, and makeup use, effectively providing a basis for further analysis, computations, animation, modification, and numerous other applications. Although existing face parsing methods have demonstrated good performance, they fail to extract rich features and recover accurate segmentation maps, particularly for faces with high variations in expression and sufficiently similar appearances. Moreover, these approaches neglect the semantic gaps and dependencies between facial categories and their boundaries. To address these drawbacks, we propose an efficient dilated convolution network with different aspect ratios to attain accurate face parsing of the output by applying the feature extraction capability. The proposed network-structured multiscale dilated encoder-decoder convolution model obtains rich component information and efficiently improves the capture of global information by obtaining low- and high-level semantic features. To achieve a delicate parsing output of the face components along the borders and analyze the connections between the face categories and their border edges, the semantic edge map is learned using a conditional random field, which aims to distinguish border and non-border pixels during the modeling. We conducted experiments using three well-known publicly available face databases. The recorded results demonstrate the high accuracy and capacity of the proposed method in comparison to previous state-of-art methods. Our proposed model achieved a mean accuracy of 90% on the CelebAMask-HQdataset for the category case and 81.43% for the accessory case, and achieved accuracies of 91.58% and 92.44% on the HELEN and LaPa datasets, respectively, thereby demonstrating its effectiveness. (C) 2022 The Author(s). Published by Elsevier B.V.

关键词： Face segmentation Dilated convolution Multiscale encoder-decoder Conditional random field

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：