This study introduces an innovative deep learning framework, the Weber Cross Information Sharing Deep Learning encoder-decoder (WCISD-ED) model, designed for emotion recognition through facial expression analysis. Rec...
详细信息
ISBN:
(纸本)9798350350661;9798350350654
This study introduces an innovative deep learning framework, the Weber Cross Information Sharing Deep Learning encoder-decoder (WCISD-ED) model, designed for emotion recognition through facial expression analysis. Recognition of emotion is a pivotal aspect of man-machine interaction, offering profound implications in areas ranging from mental health assessment to customer service and entertainment. However, because human expressions are so subtle and varied, accurately deducing emotions from facial expressions is a sophisticated task. The WCISD-ED model is crafted to address these complexities by incorporating principles derived from Weber's Law, which relates to the perception of changes in visual stimuli. This integration enhances the model's sensitivity to the minute yet critical variations in facial expressions associated with different emotions. The model features a novel cross information sharing structure within an encoder-decoder architecture, enabling the effective processing of facial features at multiple scales and depths. The encoder segment of the model focuses on the detailed extraction of facial features, while the decoder reconstructs these features into recognizable emotion categories. The cross information sharing mechanism allows for the interaction between different layers of the network, facilitating a more comprehensive and nuanced understanding of facial expressions. Extensive testing on diverse datasets demonstrates that the WCISD-ED model significantly outperforms existing emotion recognition models in terms of accuracy and reliability.
The work presented in this paper focuses on the use of Evolutionary Algorithms (EA) in two different fields of interest where researchers are keenly working on - encoder decoder and Generative Adversarial Network arch...
详细信息
ISBN:
(纸本)9781728185019
The work presented in this paper focuses on the use of Evolutionary Algorithms (EA) in two different fields of interest where researchers are keenly working on - encoder decoder and Generative Adversarial Network architectures. These architectures have brought in wonders to the field of Image processing. They gain interests of researches in various other fields like image domain mappings, interactive generation, security, detection of patterns and processing etc. The key aspects that determine a good performance of a system are its model architecture and optimal hyper parameters that drives it. This paper proposes two new analysis with Evolutionary Algorithms 1) Hyper parameter optimization of encoder decoder Model for the task of Image Inpainting and 2) Addressing the input noise vector of the Generative Adversarial Network model. In the first analysis, the research optimizes the task of semantic inpainting. The aim of inpainting is to do the semantic filling of the missing regions in original image. Inpainting task is used across various domains like medical imaging to reconstruct the damaged body parts and tissues, satellite imaging and surveillance to have semantic filling of images taken in uncertain weather conditions and in photography which allows the range of fillings possible to better fulfill the customer needs etc. The research found a set of hyper parameters which can perform similar to/outperform the manually hard coded parameters. It is really difficult task for a person to try out all the combinations and evolutionary algorithms tend to do it very easily. In second analysis, the research focuses on the input noise vector to the generator of Generative Adversarial Network. After training the generator is capable of mapping the random noise vector to an image. This study addresses grouping all those vectors together which are capable of mapping to a particular class of generation using evolutionary algorithms. This way the present study addressed the p
Due to the interaction and interdependency between various hardware/software units and policies of the processors with application running on it, real-time embedded systems are quite complex in nature. Furthermore, sm...
详细信息
Due to the interaction and interdependency between various hardware/software units and policies of the processors with application running on it, real-time embedded systems are quite complex in nature. Furthermore, smart adaptability of supply clock and voltage is required in order to optimize power without compromising on the performance, depending upon the type of application running and tasks that the application involved. This is done using Dynamic Voltage and Frequency Scaling (DVFS) technique. A novel DVFS technique is proposed in this paper which models frequency scaling as a recurrent network problem. This approach has successfully been able to capture the intricate dependencies amongst various factors influencing the operation. We employed application independent- Radial Basis Neural Network to generate series of predicted frequencies for current workload of the processor, followed by seq2seq-LSTM based encoder decoder model to decide whether the frequency generated by the ANN model is optimum to conserve power of the embedded device or not. The proposed model predicts the workload and then compares the predicted frequency to the critical value or deadline of the current task pertaining to the application running. The experiments were conducted on a single core processor (RPi Zero) on which a benchmark application named "basicmath" from MiBench suite was run, and promising prediction accuracy rates were obtained without compromising on performance parameter's degradation.
The existing data-driven approaches typically capture credibility-indicative representations from relevant articles for fake news detection, such as skeptical and conflicting opinions. However, these methods still hav...
详细信息
The existing data-driven approaches typically capture credibility-indicative representations from relevant articles for fake news detection, such as skeptical and conflicting opinions. However, these methods still have several drawbacks: 1) Due to the difficulty of collecting fake news, the capacity of the existing datasets is relatively small;and 2) there is considerable unverified news that lacks conflicting voices in relevant articles, which makes it difficult for the existing methods to identify their credibility. Especially, the differences between true and fake news are not limited to whether there are conflict features in their relevant articles, but also include more extensive hidden differences at the linguistic level, such as the perspectives of emotional expression (like extreme emotion in fake news), writing style (like the shocking title in clickbait), etc., the existing methods are difficult to fully capture these differences. To capture more general and wide-ranging differences between true and fake news, in this paper, directly from the different categories of news itself, we propose a Category-controlled encoder-decoder model (CED) to generate examples with category-differentiated features and extend the dataset capacity to achieve data enhancement effect, thus enhancing fake news detection. Specifically, to make the generated examples enrich more news features, we develop news-guided encoder to guide relevant articles to generate news-semantic context representations. To drive the generated examples to contain more category-differentiated features, we devise category-controlled decoder which relies on pattern-shared unit to respectively capture intra-category shared features within true or fake news, and employs restriction unit to force the two types of shared features to be more different for highlighting inter-category differentiated features. The experimental results on three datasets demonstrate the superiority of CED.
Integration of multiple camera-based devices, devices designed for real-time crowd counting, is explored for crowd density forecasting in an area. Crowd density forecasting in such case, where future crowd counts in m...
详细信息
ISBN:
(纸本)9798350372113;9798350372106
Integration of multiple camera-based devices, devices designed for real-time crowd counting, is explored for crowd density forecasting in an area. Crowd density forecasting in such case, where future crowd counts in multiple time steps for each sensing region of individual cameras are being predicted, is a peculiar task. This paper employs encoder-decoder Long Short-Term Memory network to solve this problem. Overcounting may also exist in the presence of overlapping sensing regions of the cameras which is tackled in this work by correcting the crowd counts before forecasting through a neural network-based approach. Thus, this study proposes a two-stream network for crowd density forecasting. The network is trained and evaluated on data generated from an actual setup emulating the presumed output of multiple camera-based crowd-counting devices. Overall, the proposed framework delivers promising results based on the evaluation.
Inspired by recent successes in neural machine translation and image caption generation, we present an attention based encoder decoder model (AED) to recognize Vietnamese Handwritten Text. The model composes of three ...
详细信息
ISBN:
(纸本)9781538691861
Inspired by recent successes in neural machine translation and image caption generation, we present an attention based encoder decoder model (AED) to recognize Vietnamese Handwritten Text. The model composes of three parts: a convolution neural network (CNN) for extracting invariant features, a Bidirectional Long Short-Term Memory network (BLSTM) for encoding extracted features (BLSTM encoder), and a Long Short-Term Memory network (LSTM) with an attention model incorporated for generating output text (LSTM decoder), which are connected from the CNN part to the BLSTM encoder and finally the LSTM decoder. The input of the CNN part is a handwritten text image and the target of the LSTM decoder is the corresponding text of the input image. Our model is trained end-to-end to predict the text from a given input image since all the parts are differential components. In the experiment section, we evaluate our proposed AED model on the VNOnDB-Word database to verify its efficiency. The experiential results show that our model achieves 12.30% of word error rate without using language model. This result is competitive with the handwriting recognition system provided by Google in the Vietnamese Online Handwritten Text Recognition competition.
Semantic image segmentation plays a crucial role in scene understanding tasks. In autonomous driving, the driving of the vehicle causes the scale changes of objects in the street scene. Although multi-scale features c...
详细信息
ISBN:
(数字)9783030368029
ISBN:
(纸本)9783030368029;9783030368012
Semantic image segmentation plays a crucial role in scene understanding tasks. In autonomous driving, the driving of the vehicle causes the scale changes of objects in the street scene. Although multi-scale features can be learned through concatenating multiple different atrous-convolved features, it is difficult to accurately segment pedestrians with only partial feature information due to factors such as occlusion. Therefore, we propose a Xiphoid Spatial Pyramid Pooling method integrated with detailed information. This method, while connecting the features of multiple atrous-convolved, retains the image-level features of target boundary information. Based on the above methods, we design an encoder-decoder architecture called DXNet. The encoder is composed of a deep convolution neural network and two XSPP modules, and the decoder decodes the advanced features through up-sampling operation and skips connection to gradually restore the target boundary. We evaluate the effectiveness of our approach on the Cityscapes dataset. Experimental results show that our method performs better in the case of occlusion, and the mean intersection-over-union score of our model outperforms some representative works.
The goal of this work is to investigate the ways in which the capabilities of machine learning algorithms, specifically those of neural networks, can be leveraged to enhance the performance of design optimization algo...
详细信息
The goal of this work is to investigate the ways in which the capabilities of machine learning algorithms, specifically those of neural networks, can be leveraged to enhance the performance of design optimization algorithms  â specifically those of topology optimization. A recent boom of interest in design optimization has occurred, coinciding with the arrival and development of advanced manufacturing techniques (such as 3D printing and additive manufacturing) which are compatible with the designs generated by these algorithms. Neural networks have seen an even larger boom in interest and development for their ability to act as ``universal function generators;" in other words, for their ability to learn highly non-linear functions that approximate the behavior of extremely complex systems. Merging design optimization algorithms with the capabilities of neural networks poses several distinct possibilities: drastically reducing optimization time by predicting solution convergence; up-scaling solution resolution using Generative Adversarial Networks (GAN's); predicting solutions with no iteration; predicting and recognizing features in the optimized solution, just to name a few. In this thesis, three neural network architectures are tested for their ability to act as solution convergence predictors of a density-based topology optimization solver. The problem is posed as an image segmentation problem, and the neural networks are all trained on a 40,000 example training set with each example containing 100 iterations from the open source optimization solver Topy, (a data set created by Sosnovik et al (2017)). The third network developed and tested is a novel hybrid network  â an inception encoder-decoder network  â which is found to outperform the other networks on the prediction task at hand.
The goal of this work is to investigate the ways in which the capabilities of machine learning algorithms, specifically those of neural networks, can be leveraged to enhance the performance of design optimization algo...
详细信息
The goal of this work is to investigate the ways in which the capabilities of machine learning algorithms, specifically those of neural networks, can be leveraged to enhance the performance of design optimization algorithms -- specifically those of topology optimization. A recent boom of interest in design optimization has occurred, coinciding with the arrival and development of advanced manufacturing techniques (such as 3D printing and additive manufacturing) which are compatible with the designs generated by these algorithms. Neural networks have seen an even larger boom in interest and development for their ability to act as ``universal function generators;" in other words, for their ability to learn highly non-linear functions that approximate the behavior of extremely complex systems. Merging design optimization algorithms with the capabilities of neural networks poses several distinct possibilities: drastically reducing optimization time by predicting solution convergence; up-scaling solution resolution using Generative Adversarial Networks (GAN's); predicting solutions with no iteration; predicting and recognizing features in the optimized solution, just to name a few. In this thesis, three neural network architectures are tested for their ability to act as solution convergence predictors of a density-based topology optimization solver. The problem is posed as an image segmentation problem, and the neural networks are all trained on a 40,000 example training set with each example containing 100 iterations from the open source optimization solver Topy, (a data set created by Sosnovik et al (2017)). The third network developed and tested is a novel hybrid network -- an inception encoder-decoder network -- which is found to outperform the other networks on the prediction task at hand.
Paraphrasing, a ubiquitous linguistic practice involving the rephrasing of sentences while preserving their underlying meaning, holds substantial significance across various Natural Language Processing (NLP) applicati...
详细信息
Paraphrasing, a ubiquitous linguistic practice involving the rephrasing of sentences while preserving their underlying meaning, holds substantial significance across various Natural Language Processing (NLP) applications. This research focuses on the domain of Arabic Paraphrase Generation, aiming to introduce an innovative model capable of generating diverse Arabic paraphrases through experimentation with deep learning model. The proposed model extends beyond conventional baseline approaches, incorporating Transformer-based architectures and ChatGPT models to enhance the richness and variety of generated paraphrases. One notable challenge addressed in this study is the absence of an Arabic parallel paraphrased dataset. Recognizing this gap in existing resources, we propose the creation of an expanded paraphrase corpus, leveraging synthetic artificial data to bolster the paraphrasing generation process. This strategic augmentation aims to not only fill a critical void in the available datasets but also to provide a robust foundation for training and evaluating the performance of the paraphrase generation model. In the experimental phase, various models, including the baseline architecture, and Transformer-based models, are examined to assess their effectiveness in generating meaningful Arabic paraphrases. The results of automatic evaluation reveal that our Fine-tuned GPT-3.5 model surpasses state-of-the-art methods, achieving remarkable scores of 23.69%, 88.30%, and 91.89% in BLEU, BERTScore, and COMET evaluations, respectively. Additionally, the Fine-tuning AraT5v1 model shows around a 2.4% improvement in the BLEU score. Moreover, for human evaluation, Cohen kappa achieved 0.9. These findings highlight the potential of Transformer-based approaches in advancing Arabic Paraphrase Generation and affirm the effectiveness of our proposed model in elevating the quality and diversity of generated paraphrases.
暂无评论