检索结果-内蒙古大学图书馆

A Study on Different Types of Convolutions in Deep Learning in the Area of Lane Detection 1

International Conference on Distributed Computing and Optimization Tech-niques (ICDCOT)

作者： Rajalakshmi, T. S. Senthilnathan, R. SRM Inst Sci & Technol Dept Mech Engn Chennai Tamil Nadu India

ISBN: (数字)9789811922817

ISBN: (纸本)9789811922817;9789811922800

One of the key technologies in autonomous vehicles is image based lane detection algorithm. High performance is detected in modern deep learning methods. But in case of challenging areas like congested roads or poor lighting conditions, it is difficult to accurately detect lanes. Global context information is required which can be extracted from limited visual-cue. Moreover, for automotive driver assisting system, like lane keep, collision avoid etc., it is important to know the position of the vehicle i.e., in which lane it is. Due to large varieties in shape and colour of the lane marking, it becomes difficult to solve this task. For this purpose, an initial step on the input image is the image processing, where the data is processed as per the requirement in pixel level semantic segmentation. Then comes in the creation of the semantic segmentation model which is able to process the data. This model can be of different variant based on the computation ability, as well as the parameter handling capacity.

关键词： Convolutional neural network Deep learning techniques Evaluation metric encoder-decoder Fully convoluted network

来源：评论

学校读者我要写书评

暂无评论

HYPERSPECTRAL MIXED NOISE REDUCTION USING TWO-STAGE CASCADE REFINED NETWORK

HYPERSPECTRAL MIXED NOISE REDUCTION USING TWO-STAGE CASCADE ...

引用

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Wu, Yinhu Zhang, Junping Liu, Dongyang Harbin Inst Technol Sch Elect & Informat Engn Harbin Peoples R China

ISBN: (数字)9781665427920

ISBN: (纸本)9781665427920

Hyperspectral images (HSIs) are inevitably influenced by various noise, including Gaussian noise, sparse noise and so on, which could degrade the HSIs and limit their applications greatly. Deep neural network (DNNs) based HSIs denoising methods have been widely used in recent years. However, the existing methods based on deep learning are mainly for Gaussian noise removal, and few are for mixed noise. Accordingly, we propose a two-stage cascade refined network consisting of two subnetworks for hyperspectral mixed Gaussian and sparse noise reduction. In the first stage, the spatial-spectral features are extracted by a feature extraction block based on attentions mechanism firstly. Then the multi-band noise is obtained by feeding the extracted features into the multi-band noise estimation subnetwork with encoder-decoder structure. Finally, the single-band denoising subnetwork in the second stage further refines the output of the previous subnetwork to accomplish single-band noise reduction. The experiments on HSI show that the superiority of the proposed method compared with four typical methods for mixed noise removal.

关键词： Cascade refined network encoder-decoder two-stage learning spatial-spectral information hyperspectral mixed noise reduction

来源：评论

学校读者我要写书评

暂无评论

Retinal Blood Vessel Segmentation and Analysis using Lightweight Spatial Attention based CNN and Data Augmentation

Retinal Blood Vessel Segmentation and Analysis using Lightwe...

引用

IEEE Calcutta Conference (CALCON)

作者： Bhuiya, Srinjoy Choudhury, Soumik Roy Aich, Geetanjali Maurya, Muskaan Sen, Anindya Heritage Inst Technol Dept Comp Sci & Engn Kolkata India Heritage Inst Technol Dept Appl Elect & Instrumentat Engn Kolkata India Indian Stat Inst Ctr Soft Comp Res Kolkata India Heritage Inst Technol Dept Elect & Commun Engn Kolkata India

ISBN: (纸本)9781665462426

The exact detection and creation of treatment regimens for conditions like diabetic retinopathy and hypertensive retinopathy depend on the segmentation of retinal blood vessels. Methods based on deep learning have been employed in the last ten years to segment blood vessels in fundus images. Due to the lack of uniform data in large quantities, the wide range of brightness and anatomical structures of the fundus images that are available, and the variety of shapes and sizes of the vessels in the tree-like vascular structure, it is still difficult to accurately segment all the vessels in a retinal fundus image. In this study, we present a unique lightweight CNN with an encoder-decoder structure for real-time and precise segmentation of blood vessels. The most popular retina datasets, DRIVE and CHASE, were used to train and evaluate the model. With an accuracy of 96.3% and 78.45% f1 score with respect to the DRIVE dataset and accuracy of 97.14% and 82.79% f1 score with respect to the CHASE dataset, we can observe that the model is lightweight and has provided comparable performance. Additionally, the suggested model runs faster with an average inference time of 0.0059 seconds and has fewer parameters than state-of-the-art models currently in use.

关键词： Color Fundus Image Deep Learning Segmentation U-net Retinal Image Segmentation Blood Vessel encoder-decoder Segmentation

来源：评论

学校读者我要写书评

暂无评论

Context Aware Back-Transliteration from English to Sinhala 22

Context Aware Back-Transliteration from English to Sinhala

引用

22nd International Conference on Advances in ICT for Emerging Regions (ICTer)

作者： Nanayakkara, Rushan Nadungodage, Thilini Pushpananda, Randil Univ Colombo Sch Comp Colombo Sri Lanka

ISBN: (纸本)9798350346138

The Sinhala language is widely used on social media by using the English alphabet to represent native Sinhala words. The standard script of English language is Roman script. Hence we refer to Sinhala texts transliterated using English alphabet as Romanized-Sinhala texts. This process of representing texts of one language using the alphabet of another language is called transliteration. Over the time Sinhala Natural Language Processing ( NLP) researchers have developed many systems to process native Sinhala texts. However, it is impossible to use the existing Sinhala text processing tools to process Romanized-Sinhala texts as those systems can only process Sinhala scripts. Therefore these texts need to be transliterated back using their original Sinhala scripts to be processed using existing Sinhala NLP tools. Transliterating texts backwards using their native alphabet is referred to as back-transliteration. In this study, we present a Transliteration Unit (TU) based back-transliteration system for the back-transliteration of Romanized-Sinhala texts. We also introduce a novel method for converting the Romanized-Sinhala scripts into TU sequences. The system was trained using a primary data set and evaluated using an unseen portion of the same data set as well as a secondary data set which represents texts from a different context to the primary data set. The proposed model has achieved 0.81 in BLEU score and 0.78 in METEOR score on the primary data set while achieving 0.57 in BLEU score and 0.47 in METEOR score on the secondary data set.

关键词： Transliteration Back transliteration LSTM encoder-decoder Transliteration-Unit

来源：评论

学校读者我要写书评

暂无评论

Devising single in-out long short-term memory univariate models for predicting the electricity price on the day-ahead markets

引用

CONNECTION SCIENCE 2024年第1期36卷

作者： Bara, Adela Oprea, Simona Vasilica Bucharest Univ Econ Studies Dept Econ Informat & Cybernet Bucharest Romania

We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models are created to handle the Electricity Price Forecast (EPF). Furthermore, an EPF methodology that consists of a LSTM univariate model, namely Single in-out (Sio) model is proposed. It builds on the Day-Ahead electricity Market (DAM) specificity and, as a novelty, it inserts the predicted value back into the sliding input vector to predict the next values until the entire vector of 24 prices is predicted. The proposed model is further enhanced by the convolutional reading of input data that is embedded into the LSTM cell or by a hybrid combination of LSTM and Convolutional Neural Networks (CNN) that interprets sub-sequences of input data and extracts features that are provided as a sequence to the LSTM model. The methodology is validated using data sets from the Romanian Market Operator (OPCOM) and other market operators from Serbia (SEEPEX), Hungary (HUPX) and Bulgaria (IBEX). Our models improve the results for the day-ahead forecast in comparison with other models by 21.02% in terms of Mean Absolute Error (MAE).

关键词： Univariate and multivariate input encoder-decoder convolutional neural networks long short-term memory electricity price day-ahead forecast

来源：评论

学校读者我要写书评

暂无评论

Image Captioning- Bangladesh's Heritage Perspective Using Deep Learning

Image Captioning- Bangladesh's Heritage Perspective Using De...

引用

IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)

作者： Alam, Sarowar Islam, Khalidul Sharmila, Nishat Sovon, Ziaur Rahman Rahman, Rashedur M. North South Univ Dept Elect & Comp Engn Plot 15Block B Dhaka 1229 Bangladesh

ISBN: (纸本)9781665486842

Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to analyze the image and provide a human-like description. encoder-decoder architectures have recently reached advanced outcomes in the form of picture captioning. With some existing datasets, e.g., Flickr_data, Flickr8k_***, and heritage dataset, we build our model that can create captions from the images related to Bangladeshi culture, tradition and historical places. Bangladesh is enriched with great culture;many heritage places and cultural programs that attract travelers to visit our country. We try to relate our culture, place, and food, together with machine learning techniques by appropriate captioning and spread over our cultural strengths through proper captioning. Our image captioning tool can be very helpful for travel lovers who want to know more about Bangladesh.

关键词： encoder-decoder datasets LSTM CNN RNNs ResNet-50 tensorflow & keras RNN

来源：评论

学校读者我要写书评

暂无评论

MULTI-SCALE REFINEMENT NETWORK BASED ACOUSTIC ECHO CANCELLATION 47

MULTI-SCALE REFINEMENT NETWORK BASED ACOUSTIC ECHO CANCELLAT...

引用

47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Cui, Fan Guo, Liyong Li, Wenfeng Gao, Peng Wang, Yujun Xiaomi Inc Beijing Peoples R China

ISBN: (纸本)9781665405409

Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, highlevel features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multiscale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.

关键词： acoustic echo cancellation encoder-decoder multi-scale

来源：评论

学校读者我要写书评

暂无评论

Weakly Supervised Semantic Segmentation of Echocardiography Videos via Multi-level Features Selection 5th

Weakly Supervised Semantic Segmentation of Echocardiography ...

引用

5th Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

作者： Chen, Erna Cai, Zemin Lai, Jian-huang Shantou Univ Dept Elect Engn Shantou 515063 Guangdong Peoples R China Sun Yat Sen Univ Sch Data & Comp Sci Guangzhou 510006 Peoples R China

ISBN: (纸本)9783031189098;9783031189104

Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technology for clinical evaluation of cardiac function. However, the uncertainty in measurement of ultrasonic equipment and inter-reader variability are always inevitable. Regarding of this situation, researchers have proposed many methods for cardiac function assessment based on deep learning. In this paper, we propose UDeep, an encoder-decoder model for left ventricular segmentation of echocardiography, which pays attention to both multi-scale high-level semantic information and multi-scale low-level fine-grained information. Our model maintains sensitivity to semantic edges, so as to accurately segment the left ventricle. The encoder extracts multiple scales high-level semantic features through a computation efficient backbone named Separated Xception and the Atrous Spacial Pyramid Pooling module. A new decoder module consisting of several Upsampling Fusion Modules (UPFMs), at the same time, is applied to fuse features of different levels. To improve the generalization of our model to different echocardiography images, we propose Pseudo-Segmentation Penalty loss function. Our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.9290 on the test set of echocardiography videos dataset.

关键词： Echocardiography Left ventricle Semantic segmentation encoder-decoder Pseudo-Segmentation Penalty Loss Function

来源：评论

学校读者我要写书评

暂无评论

Spatio-temporal Super-resolution Network: Enhance Visual Representations for Video Captioning

Spatio-temporal Super-resolution Network: Enhance Visual Rep...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Cao, Quanhui Tang, Pengjie Wang, Hanli Tongji Univ Dept Comp Sci & Technol Shanghai Peoples R China Jinggangshan Univ Coll Elect & Informat Engn Jian Jiangxi Peoples R China Tongji Univ Key Lab Embedded Syst & Serv Comp Minist Educ Shanghai Peoples R China Frontiers Sci Ctr Intelligent Autonomous Syst Shanghai Peoples R China

ISBN: (纸本)9781665484855

Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However, previous works mainly exploited elaborate features but neglected the loss of information caused by frame sampling and image compression. In this paper, we propose a novel spatio-temporal super-resolution (STSR) network which is jointly trained for the video captioning task and the video super-resolution task in an end-to-end fashion. Specifically, a video super-resolution task consists of two subtasks: spatial super-resolution restores high-resolution image features while temporal super-resolution reconstructs missing frame features between two adjacent sampled frames. By sharing multi-modal encoders across both of these two tasks, STSR encourages encoders to capture salient visual contents and learn context-aware representations. Experiments on two benchmark datasets demonstrate that the proposed STSR boosts video captioning performances significantly and outperforms most state-of-the-art approaches.

关键词： video captioning video super-resolution encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Automatic facial expressions, gaze direction and head movements generation of a virtual agent 24

Automatic facial expressions, gaze direction and head moveme...

引用

24th ACM International Conference on Multimodal Interaction (ICMI)

作者： Delbosc, Alice Ochs, Magalie Ayache, Stephane Aix Marseille Univ Marseille France

ISBN: (纸本)9781450393898

In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial encoder-decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://***/aldelb/non-verbal-behaviours-generation.

关键词： Non-verbal behaviour behaviour generation embodied conversational agent neural networks adversarial learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：