Increasing municipal waste generation puts more and more municipal water resources at high risk. Accurate prediction of water quality becomes critical for effective protection of the water resources. Due to the nonlin...
详细信息
Increasing municipal waste generation puts more and more municipal water resources at high risk. Accurate prediction of water quality becomes critical for effective protection of the water resources. Due to the nonlinear and non -stationary characteristics of water quality data of the municipal water resources, it is challenging to achieve high prediction accuracy, especially for medium -term and long-term predictions. To address this issue, we propose a novel hybrid deep learning model to predict water quality multiple steps ahead. The proposed model adopts the encoder-decoder structure in the form of two long short-term memory (LSTM) networks, integrated with the attention mechanism and a convolutional neural network (CNN). The model extracts the complex correlation between multiple water quality features through the CNN, and uses the two LSTM networks to transfer historical information to predictions, with an attention layer assigning different weights to the different parts of the historical information. Using three years of water quality data collected from an urban river, we experimentally show that the proposed model outperforms the baseline models by 11%-34% in root mean squared error (RMSE) when predicting dissolved oxygen multiple steps ahead, and by 1%-7% when predicting total phosphorus. Similar improvement has also been found in Nash-Sutcliffeefficiency (NSE) and mean absolute error (MAE). The proposed model is a feasible solution for multi -step medium -term water quality prediction.
Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still su...
详细信息
Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still suffer from the problem of severe overlap in density maps within dense areas, leading to inadequate counting and localization accuracy. This paper presents innovative research on crowd counting and localization. Firstly, addressing the limitations of density maps in localization performance in existing algorithms, we optimize the generation method of FIDT maps, decoupling the counting and localization tasks. By avoiding the problem of overlap in dense areas, the optimized label maps achieve a good balance between counting accuracy and localization, with MAE and MSE reaching 64.1 and 103.9 in SHHA, and 10.9 and 17.4 in SHHB, ***, to address the scale insensitivity of the encoder and the potential loss of critical features during the encoding process, we propose the Adaptive Feature Fusion Module and the Multi-Scale Global Attention Upsampling Module, constructing the CALNET network. By reducing redundant features inside and outside the separable branch, the model achieves global fusion of shallow features during the decoding process. The F1-m scores obtained on the SHHA and SHHB datasets reach 72.9% and 79.4% respectively, significantly improving the model's ***, this paper extends the application of crowd counting and localization algorithms to different domains such as citrus orchards, vehicles, and campus crowds. Through experiments, the robustness and transferability of the network are validated, expanding the application areas of crowd counting and localization algorithms and providing a broader space for future research.
Content-aware image retargeting (CAIR) techniques are crucial in multimedia processing for displaying images on various devices while preserving visually salient contents with desirable visual effects. There are discr...
详细信息
Content-aware image retargeting (CAIR) techniques are crucial in multimedia processing for displaying images on various devices while preserving visually salient contents with desirable visual effects. There are discrete and continuous algorithms. For the former, the artefacts happen when the foreground proportion is larger than the retargeting ratio;for the latter, the salient regions are prone to be squeezed. In this paper, we reformulate the retargeting process into sampling the salient signal and reconstruction under aesthetic supervision, the supervised multi-class image retargeting reconstruction (SMART) framework. The target images can be represented into complementary parts, the masked and unmasked ones, according to the saliency influences in the encoder phrase. The long-range sampling algorithm is proposed to calculate similarities through an 8-connected planar path while considering spatial distance and feature correlation. The sampled embeddings in latent space reconstruct the retargeted images under supervised signals for aesthetic quality. The semantic loss Lsem from the pretrained CLIP model can maintain consistency for both content and semantics. The supervised loss, Lir, is introduced to ensure the retargeted qualities are close to the preferred labels. Then, we release a new retargeting dataset comprising seven image classes (animal, building, car, flower, indoor, landscape and people) with supervised labels collected from designers for further aesthetic retargeting study. The ablation studies are conducted to confirm the effectiveness of the new dataset, and comparative experiments with state-of-the-art baselines demonstrate the advantages of the proposed method.
As an essential aspect of semantic segmentation, real-time semantic segmentation poses significant challenge in achieving trade-off between segmentation accuracy and inference speed. Standard non-local block can effec...
详细信息
As an essential aspect of semantic segmentation, real-time semantic segmentation poses significant challenge in achieving trade-off between segmentation accuracy and inference speed. Standard non-local block can effectively capture the long-range dependencies that are critical to semantic segmentation, while its huge computational cost is unacceptable for real-time semantic segmentation. To confront this issue, we propose fast non-local attention network (FNANet) with encoder-decoder structure for real-time semantic segmentation. FNANet relies on the utilization of fast non-local attention module and fast non-local attention fusion module. These modules serve the dual purpose of reducing computational demands and capturing essential contextual information, thereby achieving an equilibrium between enhanced segmentation accuracy and minimized computational overhead. Furthermore, improved non-local attention is incorporated to augment feature representation, consequently facilitating precise class label prediction. Experimental results demonstrate that FNANet outperforms state-of-the-art methods in terms of segmentation accuracy and speed on Cityscapes and CamVid.
While the encoder-decoder structure is widely used in the recent neural construction methods for learning to solve vehicle routing problems (VRPs), they are less effective in searching solutions due to deterministic f...
详细信息
While the encoder-decoder structure is widely used in the recent neural construction methods for learning to solve vehicle routing problems (VRPs), they are less effective in searching solutions due to deterministic feature embeddings and deterministic probability distributions. In this article, we propose the feature embedding refiner (FER) with a novel and generic encoder-refiner-decoderstructure to boost the existing encoder-decoder structured deep models. It is model-agnostic that the encoder and the decoder can be from any pretrained neural construction method. Regarding the introduced refiner network, we design its architecture by combining the standard gated recurrent units (GRU) cell with two new layers, i.e., an accumulated graph attention (AGA) layer and a gated nonlinear (GNL) layer. The former extracts dynamic graph topological information of historical solutions stored in a diversified solution pool to generate aggregated pool embeddings that are further improved by the GRU, and the latter adaptively refines the feature embeddings from the encoder with the guidance of the improved pool embeddings. To this end, our FER allows current neural construction methods to not only iteratively refine the feature embeddings for boarder search range but also dynamically update the probability distributions for more diverse search. We apply FER to two prevailing neural construction methods including attention model (AM) and policy optimization with multiple optima (POMO) to solve the traveling salesman problem (TSP) and the capacitated VRP (CVRP). Experimental results show that our method achieves lower gaps and better generalization than the original ones and also exhibits competitive performance to the state-of-the-art neural improvement methods.
The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, un...
详细信息
The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises from the fact that most existing methods take the backbone networks (e.g., ResNet-18 and MobileNet) as an encoder. To alleviate this problem, we propose a novel Reparameterizable Channel & Dilation (RCD) block and construct a considerably lightweight yet effective encoder by stacking several RCD blocks according to three guidelines. The strengths of the proposed encoder result in the abilities not only to extract discriminative feature representations via channel convolutions and dilated convolutions, but also to reduce computational burdens while maintaining segmentation accuracy with the help of re-parameterization technique. Except for encoder, we also present a simple but effective decoder that adopts an across-resolution fusion strategy to fuse multi-scale feature maps generated from the encoder instead of a bottom-up pathway fusion. With such an encoder and a decoder, we provide a Reparameterizable Across-resolution Fusion Network (RAFNet) for real-time semantic segmentation. Extensive experiments demonstrate that our RAFNet achieves a promising trade-off between segmentation accuracy, inference speed and network parameters. Specifically, our RAFNet with only 0.96M parameters obtains 75.3% mIoU at 107 FPS and 75.8% mIoU at 195 FPS on Cityscapes and CamVid test sets for full-resolution inputs, respectively. After quantization and deployment on a Xilinx ZCU104 device, our RAFNet obtains a favorable segmentation performance with only 1.4W power.
Clothing image segmentation is a method to predict the clothing category label of each pixel in the input image. We reduced the influence of the variability of image shots, the similarity of clothing categories, and t...
详细信息
Clothing image segmentation is a method to predict the clothing category label of each pixel in the input image. We reduced the influence of the variability of image shots, the similarity of clothing categories, and the complexity of boundaries on the segmentation accuracy of clothing images by developing an advanced ResNet50-based semantic segmentation model in this study whose primary structure is the encoder-decoder. An improved spatial pyramid pooling module combined with a global feature extraction branch of a large convolution kernel is developed to achieve multi-scale feature fusion and improve the model's ability to identify clothing and its boundary features in different shots. Furthermore, to balance the clothing shape and category information in the model, a spatial and semantic information enhancement module is proposed, which can enhance the circulation of the information between different stages of the network through cross-stage connection technology. The model was finally trained and tested on the Deepfashion2 dataset. The comparison experiment demonstrates that the proposed model obtained the highest mIoU and Boundary IoU of 74.55% and 57.51%, respectively, compared with the DeepLabv3+, PSPNet, and other networks.
Generative adversarial networks (GAN) have shown great potential for image quality improvement in low-dose CT (LDCT). In general, the shallow features of generator include more shallow visual information such as edges...
详细信息
Generative adversarial networks (GAN) have shown great potential for image quality improvement in low-dose CT (LDCT). In general, the shallow features of generator include more shallow visual information such as edges and texture, while the deep features of generator contain more deep semantic information such as organization structure. To improve the network's ability to categorically deal with different kinds of information, this paper proposes a new type of GAN with dual-encoder- single-decoderstructure. In the structure of the generator, firstly, a pyramid non-local attention module in the main encoder channel is designed to improve the feature extraction effectiveness by enhancing the features with self-similarity;Secondly, another encoder with shallow feature processing module and deep feature processing module is proposed to improve the encoding capabilities of the generator;Finally, the final denoised CT image is generated by fusing main encoder's features, shallow visual features, and deep semantic features. The quality of the generated images is improved due to the use of feature complementation in the generator. In order to improve the adversarial training ability of discriminator, a hierarchical-split ResNet structure is proposed, which improves the feature's richness and reduces the feature's redundancy in discriminator. The experimental results show that compared with the traditional single-encoder- single-decoder based GAN, the proposed method performs better in both image quality and medical diagnostic acceptability. Code is available in https://***/hanzefang/DESDGAN.
The accurate prediction of building energy consumption provides technology and data support for the construction of intelligent building energy systems. Moreover, it is also a crucial means of responding to the nation...
详细信息
The accurate prediction of building energy consumption provides technology and data support for the construction of intelligent building energy systems. Moreover, it is also a crucial means of responding to the national "Carbon Peaking and Carbon Neutrality Goals." Traditional methods can yield poor results because they fail to consider the nonlinear, nonstationary, and multi-seasonal characteristics of the building energy consumption data. To overcome these limitations, this paper proposes an asymmetric energy consumption prediction approach based on the encoder-decoder architecture. The proposed approach employs the CEEMDAN algorithm for data preprocessing to enhance the reliability of building energy consumption data. Subsequently, the convolutional gated recurrent unit (Conv-GRU) model is utilized to extract high-dimensional features and capture nonlinear relationships from the input energy consumption data. Finally, by employing the GRU-Attention algorithm to assign feature weights, this approach enhances the accuracy of building energy consumption prediction. Experimental evaluations conducted on real datasets demonstrate the superiority of the proposed approach over the existing classic methods.
Computers and electrical engineering have made great strides in steel plate manufacturing. Defect recognition techniques have also evolved. However, due to the large scale of defects, diverse features and sample imbal...
详细信息
Computers and electrical engineering have made great strides in steel plate manufacturing. Defect recognition techniques have also evolved. However, due to the large scale of defects, diverse features and sample imbalance problems of steel plates, the general algorithms often suffer from low recognition accuracy and weak robustness in practical detection. Aiming at the problems in recognition, this study proposes an improved defect segmentation network with coder-decoderstructure to realize multi-scale interaction of features. Using the split-attention feature extraction module, defect features are learned adaptively. Meanwhile, combined with the group normalization module, a surface defect region recognition model based on depth feature fusion is established. The model was trained for comparative ablation using a migration learning approach. The experimental results confirm the efficiency of the technique. 89.11 % IoU and 94.24 % Dice can be achieved on the Severstal dataset using this method. The research can be applied as an intelligent system for quality monitoring throughout the production process, guiding its rational decision-making and control to realize the improvement of strip steel product quality.
暂无评论