检索结果-内蒙古大学图书馆

MSPNet: Multi-stage progressive network for image denoising

NEUROCOMPUTING 2023年 517卷 71-80页

作者： Bai, Yu Liu, Meiqin Yao, Chao Lin, Chunyu Zhao, Yao Beijing Jiaotong Univ Inst Informat Sci Beijing 100044 Peoples R China Beijing Jiaotong Univ Beijing Key Lab Adv Informat Sci & Network Technol Beijing 100044 Peoples R China Univ Sci & Technol Beijing Sch Comp & Commun Engn Beijing 100083 Peoples R China

Image denoising which aims to restore a high-quality image from the noisy version is one of the most challenging tasks in the low-level computer vision tasks. In this paper, we propose a multi-stage progres-sive denoising network (MSPNet) and decompose the denoising task into some sub-tasks to progressively remove noise. Specifically, MSPNet is composed of three denoising stages. Each stage combines a feature extraction module (FEM) and a mutual-learning fusion module (MFM). In the feature extraction module, an encoder-decoder architecture is employed to learn non-local contextualized features, and the channel attention blocks (CAB) are utilized to retain the local information of the image. In the mutual-learning fusion module, the criss-cross attention is introduced to balance the image spatial details and the contex-tualized information. Compared with the state-of-the-art works, experimental results show that MSPNet achieves notable improvements on both objective and subjective evaluations.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Multi-stage Image denoising Criss-cross attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

SDSCNet: an instance segmentation network for efficient monitoring of goose breeding conditions

引用

APPLIED INTELLIGENCE 2023年第21期53卷 25435-25449页

作者： Li, Jiao Su, Houcheng Li, Jianing Xie, Tianyu Chen, Yijie Yuan, Jianan Jiang, Kailin Duan, Xuliang Sichuan Agr Univ Coll Informat Engn 46 Xinkang Rd Yaan 625000 Sichuan Peoples R China Univ Macau Inst Collaborat Innovat Ave Univ Taipa 999077 Macau Peoples R China Tech Univ Munich Sch Computat Informat & Technol Arcisstr 21 D-80333 Munich Bayern Germany Sichuan Agr Univ Coll Sci 46 Xinkang Rd Yaan 625000 Sichuan Provinc Peoples R China

Improve the scientific level of the goose breeding industry and help the development of intelligent agriculture. Instance Segmentation has a pivotal role when the breeders make decisions about geese breeding. It can be used for disease prevention, body size estimation and behavioural prediction, etc. However, instance segmentation requires high performance computing devices to run smoothly due to its rich output. To ameliorate this problem, this paper constructs a novel encoder-decoder module and proposes the SDSCNet model. The reasonable use of depth-separable convolution in the module reduces the number and size of model parameters and increase execution speed. Finally, SDSCNet model enables real-time identification and segmentation of individual geese with the accuracy reached *** compare this model with numerous mainstream instance segmentation models, and the final results demonstrate the excellent performance of our ***, deploying SDSCNet model on the embedded device Raspberry Pi 4 Model B can achieve effective detection of continuous moving scenes.

关键词： Deep learning Intelligent agriculture Goose breeding industry Instance segmentation encoder-decoder Real-time

来源：评论

学校读者我要写书评

暂无评论

Dual attention-based deep learning network for multi-class object semantic segmentation of tunnel point clouds

引用

AUTOMATION IN CONSTRUCTION 2023年 156卷

作者： Ji, Ankang Zhang, Limao Fan, Hongqin Xue, Xiaolong Dou, Yudan Hong Kong Polytech Univ Dept Bldg & Real Estate Hong Kong 999077 Peoples R China Hong Kong Polytech Univ Shenzhen Res Inst Shenzhen 518057 Guangdong Peoples R China Huazhong Univ Sci & Technol Natl Ctr Technol Innovat Digital Construct Sch Civil & Hydraul Engn Wuhan 430074 Peoples R China Guangzhou Univ Sch Management Guangzhou 510006 Guangdong Peoples R China Dalian Univ Technol Dept Construct Management Dalian 116024 Liaoning Peoples R China

Aiming to automatically segment multi-class objects on the tunnel point cloud, a deep learning network named dual attention-based point cloud network (DAPCNet) is developed in this paper to act on point clouds for segmentation. In the developed model, data normalization and feature aggregation are first processed to eliminate data discrepancies and enhance local features, after which the processed data are input into the built network layers based on the encoder-decoder architecture coupled with an improved 3D dual attention module to extract and learn features. Furthermore, a custom loss function called Facal Cross-Entropy ("FacalCE") is designed to enhance the model's ability to extract and learn features while addressing imbalanced data distribution. To validate the effectiveness and feasibility of the developed model, a dataset of tunnel point clouds collected from a real engineering project in China is employed. The experimental results indicate that (1) the developed model has excellent performance with Mean Intersection over Union (MIoU) of 0.8597, (2) the improved 3D dual attention module and "FacalCE" contribute to the model performance, respectively, and (3) the developed model is superior to other state-of-the-art methods, such as PointNet and DGCNN. In summary, the DAPCNet model exhibits exceptional performance, offering effective and accurate results for segmenting multi-class objects within tunnel point clouds.

关键词： Deep learning Semantic segmentation Tunnel point cloud encoder-decoder 3D dual attention module

来源：评论

学校读者我要写书评

暂无评论

A Framework for Image Captioning Based on Relation Network and Multilevel Attention Mechanism

引用

NEURAL PROCESSING LETTERS 2023年第5期55卷 5693-5715页

作者： Sharma, Himanshu Srivastava, Swati GLA Univ Mathura Dept Comp Engn & Applicat Mathura India

Understanding different semantic concepts, such as objects and their relationships in an image, and integrating them to produce a natural language description is the goal of the image captioning task. Thus, it needs an algorithm to understand the visual content of a given image and translates it into a sequence of output words. In this paper, a local relation network is designed over the objects and image regions which not only discovers the relationship between the object and the image regions but also generates significant context-based features corresponding to every region in the image. Inspired by transformer model, we have employed a multilevel attention comprising of self-attention and guided attention to focus on a given image region and its related image regions, thus enhancing the image representation capability of the proposed method. Finally, a variant of traditional long-short term memory, which uses an attention mechanism, is employed which focuses on relevant contextual information, spatial locations, and deep visual features. With these measures, the proposed model encodes an image in an improved way, which gives the model significant cues and thus leads to improved caption generation. Extensive experiments have been performed on three benchmark datasets: Flickr30k, MSCOCO, and Nocaps. On Flickr30k, the obtained evaluation scores are 31.2 BLEU@4, 23.5 METEOR, 51.5 ROUGE, 65.6 CIDEr and 17.2 SPICE. On MSCOCO, the proposed model has attained 42.4 BLEU@4, 29.4 METEOR, 59.7 ROUGE, 125.7 CIDEr and 23.2 SPICE. The overall CIDEr score on Nocaps dataset achieved by the proposed model is 114.3. The above scores clearly show the superiority of the proposed method over the existing methods.

关键词： Relation network Semantic Attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

A Holistically-Guided decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023年第10期45卷 11390-11406页

作者： Liu, Jianbo He, Junjun Zheng, Yuanjie Yi, Shuai Wang, Xiaogang Li, Hongsheng Chinese Univ Hong Kong Dept Elect Engn Hong Kong Peoples R China SenseTime Res Shanghai 200233 Peoples R China Shandong Normal Univ Sch Informat Sci & Engn Jinan 250014 Shandong Peoples R China

Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from a feature encoder, such as the FPN architecture for object detection and the U-Net for semantic segmentation. Although being more efficient, the performances of existing encoder-decoder methods for semantic segmentation are far from comparable with the dilatedFCN-based methods. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves > 2% higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.

关键词： Semantic segmentation object detection encoder-decoder dilated convolution holistic features feature pyramids

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2023年 32卷 5909-5920页

作者： Pei, Gensheng Shen, Fumin Yao, Yazhou Chen, Tao Hua, Xian-Sheng Shen, Heng-Tao Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610056 Peoples R China DAMO Acad Alibaba Grp Hangzhou 100080 Peoples R China

The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://***/NUST-Machine-Intelligence-Laboratory/HGPU.

关键词： Video object segmentation graph neural network zero-shot encoder-decoder optical flow

来源：评论

学校读者我要写书评

暂无评论

A Sentence Retrieval Generation Network Guided Video Captioning

引用

Computers, Materials & Continua 2023年第6期75卷 5675-5696页

作者： Ou Ye Mimi Wang Zhenhua Yu Yan Fu Shun Yi Jun Deng College of Computer Science and Technology Xi’an University of Science and TechnologyXi’an710054China College of Safety and Engineering Xi’an University of Science and TechnologyXi’an710054China

Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input *** contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video *** address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this ***,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data *** features are decoded to generate textual sentences that conform to video content for sentence ***,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual *** candidate sentences are screened out through similarity ***,a novel GPT-2 network model is constructed based on GPT-2 network *** model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language *** proposed method in this paper is compared with several existing works by *** results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT *** can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.

关键词： Video captioning encoder-decoder sentence retrieval external corpus RS GPT-2 network model

来源：评论

学校读者我要写书评

暂无评论

Dense Multi-Scale Convolutional Network for Plant Segmentation

引用

IEEE ACCESS 2023年 11卷 82640-82651页

作者： Tran, Thi Hoang Yen Phan, Tran Dang Khoa Univ Danang Fac Econ Univ Econ Da Nang 550000 Vietnam Univ Danang Univ Sci & Technol Fac Elect & Telecommun Engn Da Nang 550000 Vietnam

Plant segmentation is a critical task in precision agriculture as related to crop management and weed treatment. Plants can exhibit very large scale changes, which presents great challenge for accurate crop/weed segmentation. Recent works have shown that multi-scale features are useful to segment objects with different scales. In this work, we propose a Dense Multi-scale Convolutional Network (DMSCN) for pixel-wise crop/weed segmentation. Our network has an encoder-decoder structure. The encoder comprises of a Dense Convolutional Network (DCN) and a Dense Multi-Scale Atrous Pooling (DMSAP) module. DCN is composed of standard and atrous convolutions with dense connections. The architecture of DCN allows the encoder to increase the density of feature maps while avoiding signal decimation due to the dimension reduction. The proposed DMSAP connects a set of standard and atrous convolutional layers with different dilation rates in a densely cascaded manner. DMSAP is able to capture features with dense scale sampling and large receptive field. A simple yet effective decoder is used to refine the segmentation results by combining high and low-level features of the encoder. Extensive experiments are performed on four crop/weed datasets. One of these datasets was collected and annotated by us. We conduct an ablation study to show the advantages of different modules of DMSCN. The comparative study demonstrates the advantages of our model compared with the previous methods in terms of accuracy and complexity.

关键词： INDEX TERMS Atrous convolution dense connection encoder-decoder semantic segmentation spatial pyramid pooling

来源：评论

学校读者我要写书评

暂无评论

Cerebro: Static Subsuming Mutant Selection

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2023年第1期49卷 24-43页

作者： Garg, Aayush Ojdanic, Milos Degiovanni, Renzo Chekam, Thierry Titcheu Papadakis, Mike Le Traon, Yves Univ Luxembourg Esch Sur Alzette Luxembourg SES Betzdorf Luxembourg

Mutation testing research has indicated that a major part of its application cost is due to the large number of low utility mutants that it introduces. Although previous research has identified this issue, no previous study has proposed any effective solution to the problem. Thus, it remains unclear how to mutate and test a given piece of code in a best effort way, i.e., achieving a good trade-off between invested effort and test effectiveness. To achieve this, we propose Cerebro, a machine learning approach that statically selects subsuming mutants, i.e., the set of mutants that resides on the top of the subsumption hierarchy, based on the mutants ' surrounding code context. We evaluate Cerebro using 48 and 10 programs written in C and Java, respectively, and demonstrate that it preserves the mutation testing benefits while limiting application cost, i.e., reduces all cost application factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. We demonstrate that Cerebro has strong inter-project prediction ability, which is significantly higher than two baseline methods, i.e., supervised learning on features proposed by state-of-the-art, and random mutant selection. More importantly, our results show that Cerebro 's selected mutants lead to strong tests that are respectively capable of killing 2 times higher than the number of subsuming mutants killed by the baselines when selecting the same number of mutants. At the same time, Cerebro reduces the cost-related factors, as it selects, on average, 68% fewer equivalent mutants, while requiring 90% fewer test executions than the baselines.

关键词： Testing Java Codes Machine learning Costs Supervised learning Reliability Mutant mutation mutation testing subsuming mutant mutant prediction static selection static mutant selection static subsuming mutant selection static subsuming mutant prediction encoder-decoder machine translation tf-seq2seq

来源：评论

学校读者我要写书评

暂无评论

STP-TrellisNets plus : Spatial-Temporal Parallel TrellisNets for Multi-Step Metro Station Passenger Flow Prediction

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2023年第7期35卷 7526-7540页

作者： Ou, Junjie Sun, Jiahui Zhu, Yichen Jin, Haiming Liu, Yijuan Zhang, Fan Huang, Jianqiang Wang, Xinbing Shanghai Jiao Tong Univ Informat & Commun Engn Shanghai 200240 Peoples R China Shanghai Jiao Tong Univ Comp Sci & Technol Shanghai 200240 Peoples R China Shanghai Jiao Tong Univ John Hopcroft Ctr Comp Sci Shanghai 200240 Peoples R China Shanghai Jiao Tong Univ Dept Elect Engn Shanghai 200240 Peoples R China Shanghai Jiao Tong Univ Elect Engn Shanghai 200240 Peoples R China Chinese Acad Sci SIAT Beijing 100045 Peoples R China Alibaba Damo Acad Hangzhou Zhejiang Peoples R China

The drastic increase of metro passengers in recent years inevitably causes the overcrowdedness in the metro systems. Accurately predicting passenger flows at metro stations is critical for efficient metro system management, which helps alleviate such overcrowdedness. Compared to the prevalent next-step prediction, multi-step passenger flow prediction could prominently increase the prediction duration and reveal finer-grained passenger flow variations, which better helps metro system management. Thus, in this paper, we address the problem of multi-step metro station passenger (MSP) flow prediction. In light of MSP flows' unique spatial-temporal characteristics, we propose STP-TrellisNets+, which for the first time augments the newly-emerged temporal convolutional framework TrellisNet for multi-step MSP flow prediction. The temporal module of STP-TrellisNets+ (named CP-TrellisNetsED) employs a Closeness TrellisNet followed by a Periodicity TrellisNets-based encoder-decoder (P-TrellisNetsED) to jointly capture the short- and long-term temporal correlation of MSP flows. In parallel to CP-TrellisNetsED, its spatial module (named GC-TrellisNetsED) adopts a novel transfer flow-based metric to characterize the spatial correlation among MSP flows, and implements another TrellisNetsED on multiple diffusion graph convolutional networks (DGCNs) in time-series order to capture the dynamics of such spatial correlation. Extensive experiments with two large-scale real-world automated fare collection datasets demonstrate that STP-TrellisNets+ outperforms the state-of-the-art baselines.

关键词： Metro station passenger flow multi-step prediction TrellisNet encoder-decoder diffusion graph convolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：