One of the key technologies in autonomous vehicles is image based lane detection algorithm. High performance is detected in modern deep learning methods. But in case of challenging areas like congested roads or poor l...
详细信息
ISBN:
(数字)9789811922817
ISBN:
(纸本)9789811922817;9789811922800
One of the key technologies in autonomous vehicles is image based lane detection algorithm. High performance is detected in modern deep learning methods. But in case of challenging areas like congested roads or poor lighting conditions, it is difficult to accurately detect lanes. Global context information is required which can be extracted from limited visual-cue. Moreover, for automotive driver assisting system, like lane keep, collision avoid etc., it is important to know the position of the vehicle i.e., in which lane it is. Due to large varieties in shape and colour of the lane marking, it becomes difficult to solve this task. For this purpose, an initial step on the input image is the image processing, where the data is processed as per the requirement in pixel level semantic segmentation. Then comes in the creation of the semantic segmentation model which is able to process the data. This model can be of different variant based on the computation ability, as well as the parameter handling capacity.
Hyperspectral images (HSIs) are inevitably influenced by various noise, including Gaussian noise, sparse noise and so on, which could degrade the HSIs and limit their applications greatly. Deep neural network (DNNs) b...
详细信息
ISBN:
(数字)9781665427920
ISBN:
(纸本)9781665427920
Hyperspectral images (HSIs) are inevitably influenced by various noise, including Gaussian noise, sparse noise and so on, which could degrade the HSIs and limit their applications greatly. Deep neural network (DNNs) based HSIs denoising methods have been widely used in recent years. However, the existing methods based on deep learning are mainly for Gaussian noise removal, and few are for mixed noise. Accordingly, we propose a two-stage cascade refined network consisting of two subnetworks for hyperspectral mixed Gaussian and sparse noise reduction. In the first stage, the spatial-spectral features are extracted by a feature extraction block based on attentions mechanism firstly. Then the multi-band noise is obtained by feeding the extracted features into the multi-band noise estimation subnetwork with encoder-decoder structure. Finally, the single-band denoising subnetwork in the second stage further refines the output of the previous subnetwork to accomplish single-band noise reduction. The experiments on HSI show that the superiority of the proposed method compared with four typical methods for mixed noise removal.
The exact detection and creation of treatment regimens for conditions like diabetic retinopathy and hypertensive retinopathy depend on the segmentation of retinal blood vessels. Methods based on deep learning have bee...
详细信息
ISBN:
(纸本)9781665462426
The exact detection and creation of treatment regimens for conditions like diabetic retinopathy and hypertensive retinopathy depend on the segmentation of retinal blood vessels. Methods based on deep learning have been employed in the last ten years to segment blood vessels in fundus images. Due to the lack of uniform data in large quantities, the wide range of brightness and anatomical structures of the fundus images that are available, and the variety of shapes and sizes of the vessels in the tree-like vascular structure, it is still difficult to accurately segment all the vessels in a retinal fundus image. In this study, we present a unique lightweight CNN with an encoder-decoder structure for real-time and precise segmentation of blood vessels. The most popular retina datasets, DRIVE and CHASE, were used to train and evaluate the model. With an accuracy of 96.3% and 78.45% f1 score with respect to the DRIVE dataset and accuracy of 97.14% and 82.79% f1 score with respect to the CHASE dataset, we can observe that the model is lightweight and has provided comparable performance. Additionally, the suggested model runs faster with an average inference time of 0.0059 seconds and has fewer parameters than state-of-the-art models currently in use.
The Sinhala language is widely used on social media by using the English alphabet to represent native Sinhala words. The standard script of English language is Roman script. Hence we refer to Sinhala texts translitera...
详细信息
ISBN:
(纸本)9798350346138
The Sinhala language is widely used on social media by using the English alphabet to represent native Sinhala words. The standard script of English language is Roman script. Hence we refer to Sinhala texts transliterated using English alphabet as Romanized-Sinhala texts. This process of representing texts of one language using the alphabet of another language is called transliteration. Over the time Sinhala Natural Language Processing ( NLP) researchers have developed many systems to process native Sinhala texts. However, it is impossible to use the existing Sinhala text processing tools to process Romanized-Sinhala texts as those systems can only process Sinhala scripts. Therefore these texts need to be transliterated back using their original Sinhala scripts to be processed using existing Sinhala NLP tools. Transliterating texts backwards using their native alphabet is referred to as back-transliteration. In this study, we present a Transliteration Unit (TU) based back-transliteration system for the back-transliteration of Romanized-Sinhala texts. We also introduce a novel method for converting the Romanized-Sinhala scripts into TU sequences. The system was trained using a primary data set and evaluated using an unseen portion of the same data set as well as a secondary data set which represents texts from a different context to the primary data set. The proposed model has achieved 0.81 in BLEU score and 0.78 in METEOR score on the primary data set while achieving 0.57 in BLEU score and 0.47 in METEOR score on the secondary data set.
We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models ...
详细信息
We investigate the performance of intelligent systems such as various Long Short-Term Memory (LSTM) and hybrid models to forecast the electricity spot prices considering univariate and multivariate models. Six models are created to handle the Electricity Price Forecast (EPF). Furthermore, an EPF methodology that consists of a LSTM univariate model, namely Single in-out (Sio) model is proposed. It builds on the Day-Ahead electricity Market (DAM) specificity and, as a novelty, it inserts the predicted value back into the sliding input vector to predict the next values until the entire vector of 24 prices is predicted. The proposed model is further enhanced by the convolutional reading of input data that is embedded into the LSTM cell or by a hybrid combination of LSTM and Convolutional Neural Networks (CNN) that interprets sub-sequences of input data and extracts features that are provided as a sequence to the LSTM model. The methodology is validated using data sets from the Romanian Market Operator (OPCOM) and other market operators from Serbia (SEEPEX), Hungary (HUPX) and Bulgaria (IBEX). Our models improve the results for the day-ahead forecast in comparison with other models by 21.02% in terms of Mean Absolute Error (MAE).
Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to a...
详细信息
ISBN:
(纸本)9781665486842
Image captioning aims to make a textual short explanation of a given image. Despite the fact that it looks to be a straightforward task for human being, it is difficult for computers since it involves the ability to analyze the image and provide a human-like description. encoder-decoder architectures have recently reached advanced outcomes in the form of picture captioning. With some existing datasets, e.g., Flickr_data, Flickr8k_***, and heritage dataset, we build our model that can create captions from the images related to Bangladeshi culture, tradition and historical places. Bangladesh is enriched with great culture;many heritage places and cultural programs that attract travelers to visit our country. We try to relate our culture, place, and food, together with machine learning techniques by appropriate captioning and spread over our cultural strengths through proper captioning. Our image captioning tool can be very helpful for travel lovers who want to know more about Bangladesh.
Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decreas...
详细信息
ISBN:
(纸本)9781665405409
Recently, deep encoder-decoder networks have shown outstanding performance in acoustic echo cancellation (AEC). However, the subsampling operations like convolution striding in the encoder layers significantly decrease the feature resolution lead to fine-grained information loss. This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. In the encoder stage, highlevel features are obtained to get a coarse result. Then, the decoder layers with multiple refinement paths can directly refine the result with fine-grained features. Refinement paths with different feature scales are combined by learnable weights. The experimental results show that using the proposed multiscale refinement structure can significantly improve the objective criteria. In the ICASSP 2022 Acoustic echo cancellation Challenge, our submitted system achieves an overall MOS score of 4.439 with 4.37 million parameters at a system latency of 40ms.
Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technolog...
详细信息
ISBN:
(纸本)9783031189098;9783031189104
Echocardiogram illustrates what the capacity it owns of detecting the global and regional functions of the heart. With obvious benefits of non-invasion, visuality and mobility, it has become an indispensable technology for clinical evaluation of cardiac function. However, the uncertainty in measurement of ultrasonic equipment and inter-reader variability are always inevitable. Regarding of this situation, researchers have proposed many methods for cardiac function assessment based on deep learning. In this paper, we propose UDeep, an encoder-decoder model for left ventricular segmentation of echocardiography, which pays attention to both multi-scale high-level semantic information and multi-scale low-level fine-grained information. Our model maintains sensitivity to semantic edges, so as to accurately segment the left ventricle. The encoder extracts multiple scales high-level semantic features through a computation efficient backbone named Separated Xception and the Atrous Spacial Pyramid Pooling module. A new decoder module consisting of several Upsampling Fusion Modules (UPFMs), at the same time, is applied to fuse features of different levels. To improve the generalization of our model to different echocardiography images, we propose Pseudo-Segmentation Penalty loss function. Our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.9290 on the test set of echocardiography videos dataset.
Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However,...
详细信息
ISBN:
(纸本)9781665484855
Video captioning is a sequence-to-sequence task of automatically generating descriptions for given videos. Due to the diversity of video scenes, learning rich representations is critical for video captioning. However, previous works mainly exploited elaborate features but neglected the loss of information caused by frame sampling and image compression. In this paper, we propose a novel spatio-temporal super-resolution (STSR) network which is jointly trained for the video captioning task and the video super-resolution task in an end-to-end fashion. Specifically, a video super-resolution task consists of two subtasks: spatial super-resolution restores high-resolution image features while temporal super-resolution reconstructs missing frame features between two adjacent sampled frames. By sharing multi-modal encoders across both of these two tasks, STSR encourages encoders to capture salient visual contents and learn context-aware representations. Experiments on two benchmark datasets demonstrate that the proposed STSR boosts video captioning performances significantly and outperforms most state-of-the-art approaches.
In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversari...
详细信息
ISBN:
(纸本)9781450393898
In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial encoder-decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system. A large corpus of almost 4 hours of videos, involving 89 different speakers is used to train our models. We extract the speech and visual features automatically from these videos using existing tools. The evaluation of these models is conducted objectively with measures such as density evaluation and a visualisation from PCA reduction, as well as subjectively through a users perceptive study. Our proposed methodology shows that on 15 seconds sequences, encoder-decoder architecture drastically improves the perception of generated behaviours in two criteria: the coordination with speech and the naturalness. Our code can be found in : https://***/aldelb/non-verbal-behaviours-generation.
暂无评论