检索结果-内蒙古大学图书馆

Interspeech Conference

作者： Xu, Xinmeng Wang, Yang Jia, Jie Chen, Binbin Hao, Jianjun Trinity Coll Dublin Elect & Elect Engn Dublin Ireland Vivo AI Lab Shenzhen Peoples R China Hubei Univ Chinese Med Sch Foreign Languages Wuhan Peoples R China

For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human auditory perception to introduce a two-stage trainable reasoning mechanism, referred as global-local dependency (GLD) block. GLD blocks capture long-term dependency of time-frequency bins both in global level and local level from the noisy spectrogram to help detecting correlations among speech part, noise part, and whole noisy input. What is more, we conduct a monaural speech enhancement network called GLD-Net, which adopts encoder-decoder architecture and consists of speech object branch, interference branch, and global noisy branch. The extracted speech feature at global-level and local-level are efficiently reasoned and aggregated in each of the branches. We compare the proposed GLD-Net with existing state-of-art methods on WSJ0 and DEMAND dataset. The results show that GLD-Net outperforms the state-of-the-art methods in terms of PESQ and STOI.

关键词： monaural speech enhancement global and local dependency encoder-decoder architecture two-stage trainable reasoning mechanism

来源：评论

学校读者我要写书评

暂无评论

C-LIENet: A Multi-Context Low-Light Image Enhancement Network

引用

IEEE ACCESS 2021年 9卷 31053-31064页

作者： Ravirathinam, Praveen Goel, Divyam Ranjani, J. Jennifer Birla Inst Technol & Sci Dept Comp Sci & Informat Syst Pilani 333031 Rajasthan India

Enhancement of low-light images is a challenging task due to the impact of low brightness, low contrast, and high noise. The inability to collect natural labeled data intensifies this problem further. Many researchers have attempted to solve this problem using learning-based approaches;however, most models ignore the impact of noise in low-lit images. In this paper, an encoder-decoder architecture, made up of separable convolution layers that solve the issues encountered in low-light image enhancement, is proposed. The architecture is trained end-to-end on a custom low-light image dataset (LID), comprising both clean and noisy images. We introduce a unique multi-context feature extraction module (MC-FEM) where the input first passes through a feature pyramid of dilated separable convolutions for hierarchical-context feature extraction followed by separable convolutions for feature compression. The model is optimized using a novel three-part loss function that focuses on high-level contextual features, structural similarity, and patch-wise local information. We conducted several ablation studies to determine the optimal model for low-light image enhancement under noisy and noiseless conditions. We have used performance metrics like peak-signal-to-noise ratio, structural similarity index matrix, visual information fidelity, and average brightness to demonstrate the superiority of the proposed work against the state-of-the-art algorithms. Qualitative results presented in this paper prove the strength and suitability of our model for real-time applications.

关键词： Feature extraction Convolution Image enhancement Noise measurement Lighting Image coding Computer architecture encoder-decoder architecture separable convolution dilated convolution ASPP perceptual loss low-light image enhancement

来源：评论

学校读者我要写书评

暂无评论

DP-LinkNet: A convolutional network for historical document image binarization

引用

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS 2021年第5期15卷 1778-1797页

作者： Xiong, Wei Jia, Xiuhong Yang, Dichun Ai, Meihui Li, Lirong Wang, Song Hubei Univ Technol Sch Elect & Elect Engn Wuhan 430068 Hubei Peoples R China Univ South Carolina Dept Comp Sci & Engn Columbia SC 29201 USA

Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://***/beargolden/DP-LinkNet.

关键词： Degraded document image binarization semantic segmentation DP-LinkNet encoder-decoder architecture & nbsp hybrid dilated convolution (HDC) spatial pyramid pooling (SPP)

来源：评论

学校读者我要写书评

暂无评论

Deep Learning With Noisy Labels for Spatiotemporal Drought Detection

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2024年 62卷

作者： Cortes-Andres, Jordi Fernandez-Torres, Miguel-Angel Camps-Valls, Gustau Univ Valencia UV Image Proc Lab IPL Valencia 46980 Paterna Spain

Droughts pose significant challenges for accurate monitoring due to their complex spatiotemporal characteristics. Data-driven machine learning (ML) models have shown promise in detecting extreme events when enough well-annotated data is available. However, droughts do not have a unique and precise definition, which leads to noise in human-annotated events and presents an imperfect learning scenario for deep learning models. This article introduces a 3-D convolutional neural network (CNN) designed to address the complex task of drought detection, considering spatiotemporal dependencies and learning with noisy and inaccurate labels. Motivated by the shortcomings of traditional drought indices, we leverage supervised learning with labeled events from multiple sources, capturing the shared conceptual space among diverse definitions of drought. In addition, we employ several strategies to mitigate the negative effect of noisy labels (NLs) during training, including a novel label correction (LC) method that relies on model outputs, enhancing the robustness and performance of the detection model. Our model significantly outperforms state-of-the-art drought indices when detecting events in Europe between 2003 and 2015, achieving an AUROC of 72.28%, an AUPRC of 7.67%, and an ECE of 16.20%. When applying the proposed LC method, these performances improve by +5%, +15%, and +59%, respectively. Both the proposed model and the robust learning methodology aim to advance drought detection by providing a comprehensive solution to label noise and conceptual variability.

关键词： Droughts Noise measurement Training Data models Three-dimensional displays Noise Biological system modeling Solid modeling Predictive models Europe Convolutional neural networks (CNNs) drought detection encoder-decoder architecture hydro-climatological data label correction (LC) noisy labels (NLs) spatiotemporal data

来源：评论

学校读者我要写书评

暂无评论

The exploration of a Temporal Convolutional Network combined with encoder-decoder framework for runoff forecasting

引用

HYDROLOGY RESEARCH 2020年第5期51卷 1136-1149页

作者： Lin, Kangling Sheng, Sheng Zhou, Yanlai Liu, Feng Li, Zhiyu Chen, Hua Xu, Chong-Yu Chen, Jie Guo, Shenglian Wuhan Univ State Key Lab Water Resources & Hydropower Engn S Wuhan 430072 Peoples R China Wuhan Univ Hubei Prov Key Lab Water Syst Sci Sponge City Con Wuhan 430072 Peoples R China Univ Oslo Dept Geosci POB 1047 N-0316 Oslo Norway Wuhan Univ Sch Comp Sci Wuhan 430072 Peoples R China Univ Illinois Dept Geog & Geog Informat Sci Urbana IL 61801 USA

The Temporal Convolutional Network (TCN) and TCN combined with the encoder-decoder architecture (TCN-ED) are proposed to forecast runoff in this study. Both models are trained and tested using the hourly data in the Jianxi basin, China. The results indicate that the forecast horizon has a great impact on the forecast ability, and the concentration time of the basin is a critical threshold to the effective forecast horizon for both models. Both models perform poorly in the low flow and well in the medium and high flow at most forecast horizons, while it is subject to the forecast horizon in forecasting peak flow. TCN-ED has better performance than TCN in runoff forecasting, with higher accuracy, better stability, and insensitivity to fluctuations in the rainfall process. Therefore, TCN-ED is an effective deep learning solution in runoff forecasting within an appropriate forecast horizon.

关键词： artificial neural network deep learning encoder-decoder architecture runoff forecasting Temporal Convolutional Network

来源：评论

学校读者我要写书评

暂无评论

CGFTNet: Content-Guided Frequency Domain Transform Network for Face Super-Resolution

引用

INFORMATION 2024年第12期15卷 765-765页

作者： Yekeben, Yeerlan Cheng, Shuli Du, Anyu Xinjiang Univ Sch Comp Sci & Technol Urumqi 830046 Peoples R China

Recent advancements in face super resolution (FSR) have been propelled by deep learning techniques using convolutional neural networks (CNN). However, existing methods still struggle with effectively capturing global facial structure information, leading to reduced fidelity in reconstructed images, and often require additional manual data annotation. To overcome these challenges, we introduce a content-guided frequency domain transform network (CGFTNet) for face super-resolution tasks. The network features a channel attention-linked encoder-decoder architecture with two key components: the Frequency Domain and Reparameterized Focus Convolution Feature Enhancement module (FDRFEM) and the Content-Guided Channel Attention Fusion (CGCAF) module. FDRFEM enhances feature representation through transformation domain techniques and reparameterized focus convolution (RefConv), capturing detailed facial features and improving image quality. CGCAF dynamically adjusts feature fusion based on image content, enhancing detail restoration. Extensive evaluations across multiple datasets demonstrate that the proposed CGFTNet consistently outperforms other state-of-the-art methods.

关键词： face super-resolution encoder-decoder architecture feature fusion channel attention visual transformer

来源：评论

学校读者我要写书评

暂无评论

A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning

引用

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL 2024年第2期13卷 25-25页

作者： Sharma, Gaurav Singh, Maheep NIT Uttarakhand Dept Comp Sci & Engn Srinagar India

Video saliency prediction aims to simulate human visual attention by selecting the most pertinent and important components within a video frame or sequence. When evaluating video saliency, time and space data are essential, particularly in the presence of challenging features such as fast motion, shifting background, and nonrigid deformation. The current video saliency frameworks are highly prone to failure under the specified conditions. Moreover, it is unsuitable to perform video saliency identification by solely relying on image saliency models, disregarding the temporal information in videos. This research proposes a novel Spatiotemporal Bidirectional Network for Video Salient Object Detection using Multiscale Transfer Learning (SBMTL-Net) to solve the issue of detecting important objects in videos. The SBMTL-Net produces significant outcomes for a given sequence of frames by utilizing Multi-scale transfer learning with an encoder and decoder technique to acquire knowledge and spatially and temporally map properties. SBMTL-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Video Geometry Group) and VGG19 are utilized for multi-scale feature extraction of the input video frames. The performance of the proposed model has been evaluated on five publically available challenging datasets DAVIS-T, SegTrack-V2, ViSal, VOS-T and DAVSOD-T for the parameters MAE, F-measure and S-measure. The experimental results show the effectiveness of the proposed model as compared with other competitive models.

关键词： Video Saliency Spatiotemporal encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

Solving Machine Learning Problems 13

Solving Machine Learning Problems

引用

13th Asian Conference on Machine Learning (ACML)

作者： Tran, Sunny Krishna, Pranav Pakuwal, Ishan Kafle, Prabhakar Singh, Nikhil Lynch, Jayson Drori, Iddo MIT EECS Cambridge MA 02139 USA MIT Media Lab Cambridge MA 02139 USA Univ Waterloo Waterloo ON Canada

Can a machine learn Machine Learning? This work trains a machine learning model to solve machine learning problems from a University undergraduate level course. We generate a new training set of questions and answers consisting of course exercises, homework, and quiz questions from MIT's 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions. Our system demonstrates an overall accuracy of 96% for open-response questions and 97% for multiple-choice questions, compared with MIT students' average of 93%, achieving grade A performance in the course, all in real-time. Questions cover all 12 topics taught in the course, excluding coding questions or questions with images. Topics include: (i) basic machine learning principles;(ii) perceptrons;(iii) feature extraction and selection;(iv) logistic regression;(v) regression;(vi) neural networks;(vii) advanced neural networks;(viii) convolutional neural networks;(ix) recurrent neural networks;(x) state machines and MDPs;(xi) reinforcement learning;and (xii) decision trees. Our system uses Transformer models within an encoder-decoder architecture with graph and tree representations. An important aspect of our approach is a data-augmentation scheme for generating new example problems. We also train a machine learning model to generate problem hints. Thus, our system automatically generates new questions across topics, answers both open-response questions and multiple-choice questions, classifies problems, and generates problem hints, pushing the envelope of AI for STEM education.

关键词： Learning to learn Machine Learning course Transformers Graph neural network Expression trees encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

Image to LaTeX with Graph Neural Network for Mathematical Formula Recognition 16th

Image to LaTeX with Graph Neural Network for Mathematical Fo...

引用

16th IAPR International Conference on Document Analysis and Recognition (ICDAR)

作者： Peng, Shuai Gao, Liangcai Yuan, Ke Tang, Zhi Peking Univ Wangxuan Inst Comp Technol Beijing Peoples R China

ISBN: (纸本)9783030863319

Mathematical formula recognition aims to automatically convert formula images into their structured description formats. Recently, some encoder-decoder models have been presented for this task, while they seldom explicitly consider spatial relationship among symbols. In this paper, we proposed a novel encoder-decoder model with Graph Neural Network (GNN) to translate mathematical formula images into LaTeX codes. In the proposed model, the symbols segmented from the raw image are used to build graphs based on their spatial connection. The encoder consists of Convolutional Neural Network (CNN) and GNN. CNN is utilized to extract the visual features from the whole formula or symbols, and GNN is used to transmit the spatial information embedded in the built graphs. The adopted decoder is a Recurrent Neural Network (RNN) model, which implements a language model to generate the output sentences based on the encoded features with attention mechanism. The experimental results on IM2LATEX-100K dataset demonstrated that the proposed model obtained a better performance than state-of-the-art approaches.

关键词： Mathematical formula recognition Graph neural network encoder-decoder architecture image to LaTeX

来源：评论

学校读者我要写书评

暂无评论

Multi-scale feature fusion network for pixel-level pavement distress detection

引用

AUTOMATION IN CONSTRUCTION 2022年第0期141卷

作者： Zhong, Jingtao Zhu, Junqing Huyan, Ju Ma, Tao Zhang, Weiguang Southeast Univ Sch Transportat Nanjing 211189 Peoples R China

Automatic pavement distress detection is essential to monitoring and maintaining pavement condition. Currently, many deep learning-based methods have been utilized in pavement distress detection. However, distress segmentation remains as a challenge under complex pavement conditions. In this paper, a novel deep neural network architecture, W-segnet, based on multi-scale feature fusions, is proposed for pixel-wise distress segmentation. The proposed W-segnet concatenates distress location information with distress classification features in two symmetric encoder-decoder structures. Three major types of distresses: crack, pothole, and patch are segmented and the results were discussed. Experimental results show that the proposed W-segnet is robust in various scenarios, achieving a mean pixel accuracy (MPA) of 87.52% and a mean intersection over union (MIoU) of 75.88%. The results demonstrate that W-segnet outperforms other state-of-the-art semantic segmentation models of U-net, SegNet, and PSPNet. Comparison of cost of model training and inference indicates that W-segnet has the largest number of parameters, which needs a slightly longer training time while it does not increase the inference cost. Four public datasets were used to test the generalization ability of the proposed model and the results demonstrate that the W-segnet possesses well segmentation performance.

关键词： Deep learning encoder-decoder architecture Pavement distress Feature fusion Semantic segmentation Unmanned aerial vehicle (UAV)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：