With the increasing usage of wearable electrocardiogram (ECG) monitoring devices, it is necessary to develop models and algorithms that can analyze the large amounts of ECG data obtained in real-time. Accurate ECG del...
详细信息
With the increasing usage of wearable electrocardiogram (ECG) monitoring devices, it is necessary to develop models and algorithms that can analyze the large amounts of ECG data obtained in real-time. Accurate ECG delineation is key to assisting cardiologists in diagnosing cardiac diseases. The main objective of this study is to design a delineation model based on the encoder-decoder structure to detect different heartbeat waveforms, including P-waves, QRS complexes, T-waves, and No waves (NW), as well as the onset and offset of these waveforms. First, the introduction of a standard dilated convolution module (SDCM) into the encoder path enabled the model to extract more useful ECG signal-informative features. Subsequently, bidirectional long shortterm memory (BiLSTM) was added to the encoding structure to obtain numerous temporal features. Moreover, the feature sets of the ECG signals at each level in the encoder path were connected to the decoder part for multi scale decoding to mitigate the information loss caused by the pooling operation in the encoding process. Finally, the proposed model was trained and tested on both QT and LU databases, and it achieved accurate results compared to other state-of-the-art methods. Regarding the QT database, the average accuracy of ECG waveform classification was 96.90%, and an average classification accuracy of 95.40% was obtained on the LU database. In addition, average F1 values of 99.58% and 97.05% were achieved in the ECG delineation task of the QT and LU databases, respectively. The results show that the proposed ECG_SegNet model has good flexibility and reliability when applied to ECG delineation, and it is a reliable method for analyzing ECG signals in real-time.
Regional rainfall-runoff modeling is a classic and significant research topic in hydrological sciences. Currently, the predominant modeling approach is developing data-driven models. This study proposes a rainfall-run...
详细信息
Regional rainfall-runoff modeling is a classic and significant research topic in hydrological sciences. Currently, the predominant modeling approach is developing data-driven models. This study proposes a rainfall-runoff model named ED-TimesNet (encoder-decoder-based TimesNet), which consists of convolutional neural networks. It transforms a one-dimensional time series into a two-dimensional matrix based on frequency-domain partitioning rules and subsequently employs a two-dimensional visual backbone to learn both local and global features of the hydrological time series. Compared to LSTM-based models and Transformer models, this model learns both intra-period and inter-period variations in hydrological series, simultaneously focusing on the relationships between adjacent and non-adjacent time points. It alleviates the temporal ambiguity problem inherent in attention mechanisms. This research validates the performance of the ED-TimesNet model in regional rainfall-runoff modeling tasks using the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset. The model achieves a median and mean NSE of 0.8049 and 0.7808, respectively, across 448 basins, outperforming the benchmark LSTM, VIC, and mHM models, and achieving comparable performance to the Transformer model. This paper does not address the model's performance on ungauged basins. The method of predicting runoff based on the periodic features of hydrological data provides a novel perspective for hydrological sciences.
Regular detection and repair for lining cracks are necessary to guarantee the safety and stability of *** development of computer vision has greatly promoted structural health *** study proposes a novel encoder–decod...
详细信息
Regular detection and repair for lining cracks are necessary to guarantee the safety and stability of *** development of computer vision has greatly promoted structural health *** study proposes a novel encoder–decoderstructure,CrackRecNet,for semantic segmentation of lining segment cracks by integrating improved VGG-19 into the U-Net *** image acquisition equipment is designed based on a camera,3-dimensional printing(3DP)bracket and two laser rangefinders.A tunnel concrete structure crack(TCSC)image data set,containing images collected from a double-shield tunnel boring machines(TBM)tunnel in China,was *** data preprocessing operations,such as brightness adjustment,pixel resolution adjustment,flipping,splitting and annotation,2880 image samples with pixel resolution of 448×448 were *** model was implemented by Pytorch in PyCharm processed with 4 NVIDIA TITAN V *** the experiments,the proposed CrackRecNet showed better prediction performance than U-Net,TernausNet,and *** paper also discusses GPU parallel acceleration effect and the crack maximum width quantification.
Traveling shortest path planning, encompassing the Traveling Salesman Problem (TSP) in graph theory, holds profound significance. The motivation for addressing the TSP stems from its critical application in real-world...
详细信息
Traveling shortest path planning, encompassing the Traveling Salesman Problem (TSP) in graph theory, holds profound significance. The motivation for addressing the TSP stems from its critical application in real-world scenarios, such as logistics, where optimal routing can substantially reduce costs and improve service efficiency. Furthermore, TSP-like challenges play a pivotal role in assisting travelers to chart the optimal itinerary, encompassing all landmarks in the least distance or time and concluding at the departure site. This optimization not only streamlines travel routes but also economizes time and energy, ensuring maximal sightseeing within a confined timeframe. Recognizing the limitations of current solutions in achieving high efficiency and accuracy simultaneously, we propose an innovative Association & Integration-based encoder-decoder structure tailored for solving the Traveling Salesman Problem, i.e., A&*** proposed structure comprises four blocks: the information linkage space, dual-path integration encoder, node encoder, and representation decoder. Specifically, the information linkage space constructs associations among hidden information between input sequence samples. The dual-path integration encoder extracts and merges the original representations of the sequence with associated representations. The node encoder extracts current sequence representations, while the representation decoder block computes the probability distribution of sequence samples, completing the combinatorial optimization of the entire sequence. In the experimental evaluation, we utilized three different metrics: Average Tour Length (ATL), Optimality Gap (OG), and Evaluation Time (ET). We compared the proposed method with classical approximation methods and various state-of-the-art deep learning approaches. The experimental results show that our A&I-ED-TSP structure achieved the best ATLs of 5.704, 12.770, 17.981, 21.979, and 25.293 for TSP instances of TSP50, TS
In road extraction from remote sensing images, the road environment is complex and blocked by trees, buildings, and other objects, making it impossible to extract practical (continuous and complete) road information. ...
详细信息
In road extraction from remote sensing images, the road environment is complex and blocked by trees, buildings, and other objects, making it impossible to extract practical (continuous and complete) road information. We propose a joint attention encoder-decoder network (JAED-Net) for road extraction from remote sensing images to solve these problems. First, JAED-Net encodes a modified residual network as the backbone for road feature extraction. A joint attention module is added to the encoder to enhance the network's ability to learn and express road features. Then, strip convolution is added to the decoder, so the network retains more spatial features, such as the width and connectivity of roads during upsampling. Finally, a hybrid weighted loss function is introduced to train the network and ensure stability because of the unbalanced ratio of road and background pixels in remote sensing images. Experimental validation of the proposed network is performed on three publicly available datasets.
encoders are widely used in the field of image caption, but the statements generated by the current image caption method may miss the target and the generated description statements are not appropriate enough for the ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
encoders are widely used in the field of image caption, but the statements generated by the current image caption method may miss the target and the generated description statements are not appropriate enough for the image content. In order to solve the above problems, we propose a coarse-fine image caption method based on dual encoder-decoder framework, which provides a mechanism for discovering and correcting omissions and enables the model to generate a complete image description. Firstly, an image feature extractor based on global and local information is designed, which can extract global information and local information of image and obtain more abundant image representation. Secondly, a dual encoder-decoder framework is designed, which consists of a coarse-grained encoder-decoder and a fine-grained encoder-decoder. Coarse-grained encoder-decoder requires only the original image features as input, which is processed by transformer to produce a coarse text description. In addition, an image feature auto-enhancement module is proposed to detect missing objects in coarse text and enhance their feature expression. Finally, the fine-grained encoder-decoder uses both the image feature and the coarse text caption as input, and generates the final fine-grained caption after multi-modal information fusion. Experimental results on MSCOCO datasets show that our proposed method outperforms previous image caption methods and achieves a performance of 39.7 BLEU-4 score and 121.6 CIDEr-A score.
Enabled by hierarchical convolutions and nonlinear mappings, recent action recognition studies have continuously boosted performance with spatiotemporal modelling. In general, motion clues are essential in video-orien...
详细信息
Enabled by hierarchical convolutions and nonlinear mappings, recent action recognition studies have continuously boosted performance with spatiotemporal modelling. In general, motion clues are essential in video-oriented tasks, while existing approaches aggregate the spatial and temporal signatures via specially designed modules in the middle or output stages. To highlight the privilege provided by temporal motions, in this paper, we propose a simple but effective MOTion Estimator (MOTE) to generate the motion patterns from every single frame, avoiding complex dense-frame input. In particular, MOTE follows an encoder-decoder structure, which takes the short-term motion features generated by the pretrained dense-frame network as the learning target. The spatial information of a single frame is utilized to estimate the instantaneous motion appearance. It can support the expression of vulnerable regions, such as the 'hand' in 'waving hands,' which would otherwise be suppressed in the feature maps as the 'hand' suffers from motion blur. The training process of MOTE is independent of the action recognition system. Therefore, the trained MOTE can be transplanted to the input-end of existing action recognition methods to provide instantaneous motion estimation as feature enhancement according to practical requirements. Our experiments performed on Something-Something V1, V2, Kinetics-400, and Diving48 verify the effectiveness of the proposed method.
Non-local network provides a pioneering approach for capturing long-range dependency by aggregating query-specific global context into each query location;however, non-local network applies the identical weight to eac...
详细信息
Non-local network provides a pioneering approach for capturing long-range dependency by aggregating query-specific global context into each query location;however, non-local network applies the identical weight to each channel of feature maps and ignores the differences from the different channels of features. We design a novel tensor attention module (TAM), which integrates the context information along spatial dimension and channel dimension by introducing a bias learnable parameters tensor, so that the feature at each location of each channel can aggregate the features from all other locations. Motivated by SE-Net, we propose a novel second-order covariance attention module (SCAM) to enhance the feature correlation between different channel maps through the second-order statistics and the local cross-channel interaction strategy. We take the encoder-decoder segmentation network DeepLabv3+ as baseline, and in the encoder develop the attention modules TAM and SCAM for semantic segmentation (TCNet). Experimental results on PASCAL VOC 2012 and Cityscapes datasets show that our proposed network has better performance than the other state-of-the-art segmentation networks.
Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still su...
详细信息
Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still suffer from the problem of severe overlap in density maps within dense areas, leading to inadequate counting and localization accuracy. This paper presents innovative research on crowd counting and localization. Firstly, addressing the limitations of density maps in localization performance in existing algorithms, we optimize the generation method of FIDT maps, decoupling the counting and localization tasks. By avoiding the problem of overlap in dense areas, the optimized label maps achieve a good balance between counting accuracy and localization, with MAE and MSE reaching 64.1 and 103.9 in SHHA, and 10.9 and 17.4 in SHHB, ***, to address the scale insensitivity of the encoder and the potential loss of critical features during the encoding process, we propose the Adaptive Feature Fusion Module and the Multi-Scale Global Attention Upsampling Module, constructing the CALNET network. By reducing redundant features inside and outside the separable branch, the model achieves global fusion of shallow features during the decoding process. The F1-m scores obtained on the SHHA and SHHB datasets reach 72.9% and 79.4% respectively, significantly improving the model's ***, this paper extends the application of crowd counting and localization algorithms to different domains such as citrus orchards, vehicles, and campus crowds. Through experiments, the robustness and transferability of the network are validated, expanding the application areas of crowd counting and localization algorithms and providing a broader space for future research.
As an essential aspect of semantic segmentation, real-time semantic segmentation poses significant challenge in achieving trade-off between segmentation accuracy and inference speed. Standard non-local block can effec...
详细信息
As an essential aspect of semantic segmentation, real-time semantic segmentation poses significant challenge in achieving trade-off between segmentation accuracy and inference speed. Standard non-local block can effectively capture the long-range dependencies that are critical to semantic segmentation, while its huge computational cost is unacceptable for real-time semantic segmentation. To confront this issue, we propose fast non-local attention network (FNANet) with encoder-decoder structure for real-time semantic segmentation. FNANet relies on the utilization of fast non-local attention module and fast non-local attention fusion module. These modules serve the dual purpose of reducing computational demands and capturing essential contextual information, thereby achieving an equilibrium between enhanced segmentation accuracy and minimized computational overhead. Furthermore, improved non-local attention is incorporated to augment feature representation, consequently facilitating precise class label prediction. Experimental results demonstrate that FNANet outperforms state-of-the-art methods in terms of segmentation accuracy and speed on Cityscapes and CamVid.
暂无评论