Image denoising which aims to restore a high-quality image from the noisy version is one of the most challenging tasks in the low-level computer vision tasks. In this paper, we propose a multi-stage progres-sive denoi...
详细信息
Image denoising which aims to restore a high-quality image from the noisy version is one of the most challenging tasks in the low-level computer vision tasks. In this paper, we propose a multi-stage progres-sive denoising network (MSPNet) and decompose the denoising task into some sub-tasks to progressively remove noise. Specifically, MSPNet is composed of three denoising stages. Each stage combines a feature extraction module (FEM) and a mutual-learning fusion module (MFM). In the feature extraction module, an encoder-decoder architecture is employed to learn non-local contextualized features, and the channel attention blocks (CAB) are utilized to retain the local information of the image. In the mutual-learning fusion module, the criss-cross attention is introduced to balance the image spatial details and the contex-tualized information. Compared with the state-of-the-art works, experimental results show that MSPNet achieves notable improvements on both objective and subjective evaluations.(c) 2022 Elsevier B.V. All rights reserved.
Improve the scientific level of the goose breeding industry and help the development of intelligent agriculture. Instance Segmentation has a pivotal role when the breeders make decisions about geese breeding. It can b...
详细信息
Improve the scientific level of the goose breeding industry and help the development of intelligent agriculture. Instance Segmentation has a pivotal role when the breeders make decisions about geese breeding. It can be used for disease prevention, body size estimation and behavioural prediction, etc. However, instance segmentation requires high performance computing devices to run smoothly due to its rich output. To ameliorate this problem, this paper constructs a novel encoder-decoder module and proposes the SDSCNet model. The reasonable use of depth-separable convolution in the module reduces the number and size of model parameters and increase execution speed. Finally, SDSCNet model enables real-time identification and segmentation of individual geese with the accuracy reached *** compare this model with numerous mainstream instance segmentation models, and the final results demonstrate the excellent performance of our ***, deploying SDSCNet model on the embedded device Raspberry Pi 4 Model B can achieve effective detection of continuous moving scenes.
Aiming to automatically segment multi-class objects on the tunnel point cloud, a deep learning network named dual attention-based point cloud network (DAPCNet) is developed in this paper to act on point clouds for seg...
详细信息
Aiming to automatically segment multi-class objects on the tunnel point cloud, a deep learning network named dual attention-based point cloud network (DAPCNet) is developed in this paper to act on point clouds for segmentation. In the developed model, data normalization and feature aggregation are first processed to eliminate data discrepancies and enhance local features, after which the processed data are input into the built network layers based on the encoder-decoder architecture coupled with an improved 3D dual attention module to extract and learn features. Furthermore, a custom loss function called Facal Cross-Entropy ("FacalCE") is designed to enhance the model's ability to extract and learn features while addressing imbalanced data distribution. To validate the effectiveness and feasibility of the developed model, a dataset of tunnel point clouds collected from a real engineering project in China is employed. The experimental results indicate that (1) the developed model has excellent performance with Mean Intersection over Union (MIoU) of 0.8597, (2) the improved 3D dual attention module and "FacalCE" contribute to the model performance, respectively, and (3) the developed model is superior to other state-of-the-art methods, such as PointNet and DGCNN. In summary, the DAPCNet model exhibits exceptional performance, offering effective and accurate results for segmenting multi-class objects within tunnel point clouds.
Understanding different semantic concepts, such as objects and their relationships in an image, and integrating them to produce a natural language description is the goal of the image captioning task. Thus, it needs a...
详细信息
Understanding different semantic concepts, such as objects and their relationships in an image, and integrating them to produce a natural language description is the goal of the image captioning task. Thus, it needs an algorithm to understand the visual content of a given image and translates it into a sequence of output words. In this paper, a local relation network is designed over the objects and image regions which not only discovers the relationship between the object and the image regions but also generates significant context-based features corresponding to every region in the image. Inspired by transformer model, we have employed a multilevel attention comprising of self-attention and guided attention to focus on a given image region and its related image regions, thus enhancing the image representation capability of the proposed method. Finally, a variant of traditional long-short term memory, which uses an attention mechanism, is employed which focuses on relevant contextual information, spatial locations, and deep visual features. With these measures, the proposed model encodes an image in an improved way, which gives the model significant cues and thus leads to improved caption generation. Extensive experiments have been performed on three benchmark datasets: Flickr30k, MSCOCO, and Nocaps. On Flickr30k, the obtained evaluation scores are 31.2 BLEU@4, 23.5 METEOR, 51.5 ROUGE, 65.6 CIDEr and 17.2 SPICE. On MSCOCO, the proposed model has attained 42.4 BLEU@4, 29.4 METEOR, 59.7 ROUGE, 125.7 CIDEr and 23.2 SPICE. The overall CIDEr score on Nocaps dataset achieved by the proposed model is 114.3. The above scores clearly show the superiority of the proposed method over the existing methods.
Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strat...
详细信息
Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from a feature encoder, such as the FPN architecture for object detection and the U-Net for semantic segmentation. Although being more efficient, the performances of existing encoder-decoder methods for semantic segmentation are far from comparable with the dilatedFCN-based methods. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves > 2% higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.
The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant ...
详细信息
The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://***/NUST-Machine-Intelligence-Laboratory/HGPU.
Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input *** contents of video captioning are limited since few studies employed external corpus information to guide the ge...
详细信息
Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input *** contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video *** address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this ***,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data *** features are decoded to generate textual sentences that conform to video content for sentence ***,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual *** candidate sentences are screened out through similarity ***,a novel GPT-2 network model is constructed based on GPT-2 network *** model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language *** proposed method in this paper is compared with several existing works by *** results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT *** can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.
Plant segmentation is a critical task in precision agriculture as related to crop management and weed treatment. Plants can exhibit very large scale changes, which presents great challenge for accurate crop/weed segme...
详细信息
Plant segmentation is a critical task in precision agriculture as related to crop management and weed treatment. Plants can exhibit very large scale changes, which presents great challenge for accurate crop/weed segmentation. Recent works have shown that multi-scale features are useful to segment objects with different scales. In this work, we propose a Dense Multi-scale Convolutional Network (DMSCN) for pixel-wise crop/weed segmentation. Our network has an encoder-decoder structure. The encoder comprises of a Dense Convolutional Network (DCN) and a Dense Multi-Scale Atrous Pooling (DMSAP) module. DCN is composed of standard and atrous convolutions with dense connections. The architecture of DCN allows the encoder to increase the density of feature maps while avoiding signal decimation due to the dimension reduction. The proposed DMSAP connects a set of standard and atrous convolutional layers with different dilation rates in a densely cascaded manner. DMSAP is able to capture features with dense scale sampling and large receptive field. A simple yet effective decoder is used to refine the segmentation results by combining high and low-level features of the encoder. Extensive experiments are performed on four crop/weed datasets. One of these datasets was collected and annotated by us. We conduct an ablation study to show the advantages of different modules of DMSCN. The comparative study demonstrates the advantages of our model compared with the previous methods in terms of accuracy and complexity.
Mutation testing research has indicated that a major part of its application cost is due to the large number of low utility mutants that it introduces. Although previous research has identified this issue, no previous...
详细信息
Mutation testing research has indicated that a major part of its application cost is due to the large number of low utility mutants that it introduces. Although previous research has identified this issue, no previous study has proposed any effective solution to the problem. Thus, it remains unclear how to mutate and test a given piece of code in a best effort way, i.e., achieving a good trade-off between invested effort and test effectiveness. To achieve this, we propose Cerebro, a machine learning approach that statically selects subsuming mutants, i.e., the set of mutants that resides on the top of the subsumption hierarchy, based on the mutants ' surrounding code context. We evaluate Cerebro using 48 and 10 programs written in C and Java, respectively, and demonstrate that it preserves the mutation testing benefits while limiting application cost, i.e., reduces all cost application factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. We demonstrate that Cerebro has strong inter-project prediction ability, which is significantly higher than two baseline methods, i.e., supervised learning on features proposed by state-of-the-art, and random mutant selection. More importantly, our results show that Cerebro 's selected mutants lead to strong tests that are respectively capable of killing 2 times higher than the number of subsuming mutants killed by the baselines when selecting the same number of mutants. At the same time, Cerebro reduces the cost-related factors, as it selects, on average, 68% fewer equivalent mutants, while requiring 90% fewer test executions than the baselines.
The drastic increase of metro passengers in recent years inevitably causes the overcrowdedness in the metro systems. Accurately predicting passenger flows at metro stations is critical for efficient metro system manag...
详细信息
The drastic increase of metro passengers in recent years inevitably causes the overcrowdedness in the metro systems. Accurately predicting passenger flows at metro stations is critical for efficient metro system management, which helps alleviate such overcrowdedness. Compared to the prevalent next-step prediction, multi-step passenger flow prediction could prominently increase the prediction duration and reveal finer-grained passenger flow variations, which better helps metro system management. Thus, in this paper, we address the problem of multi-step metro station passenger (MSP) flow prediction. In light of MSP flows' unique spatial-temporal characteristics, we propose STP-TrellisNets+, which for the first time augments the newly-emerged temporal convolutional framework TrellisNet for multi-step MSP flow prediction. The temporal module of STP-TrellisNets+ (named CP-TrellisNetsED) employs a Closeness TrellisNet followed by a Periodicity TrellisNets-based encoder-decoder (P-TrellisNetsED) to jointly capture the short- and long-term temporal correlation of MSP flows. In parallel to CP-TrellisNetsED, its spatial module (named GC-TrellisNetsED) adopts a novel transfer flow-based metric to characterize the spatial correlation among MSP flows, and implements another TrellisNetsED on multiple diffusion graph convolutional networks (DGCNs) in time-series order to capture the dynamics of such spatial correlation. Extensive experiments with two large-scale real-world automated fare collection datasets demonstrate that STP-TrellisNets+ outperforms the state-of-the-art baselines.
暂无评论