Online Customer Reviews (OCRs) make it difficult for firms to examine them due to their number, diversity, pace, and validity. The big data analytics study predicts OCR reading and its usefulness. Titles with positive...
详细信息
Stock market is a complex and dynamic industry that has always presented challenges for stakeholders and investors due to its unpredictable nature. This unpredictability motivates the need for more accurate prediction...
详细信息
ISBN:
(纸本)9789819759330;9789819759347
Stock market is a complex and dynamic industry that has always presented challenges for stakeholders and investors due to its unpredictable nature. This unpredictability motivates the need for more accurate prediction models. Traditional prediction models have limitations in handling the dynamic nature of the stock market. Additionally, previous methods have used less relevant data, leading to suboptimal performance. This study proposes the use of Bidirectional Encoder Representations from Transformers (BERT), a pre-trained Large language Model (LLM), to predict Dhaka Stock Exchange (DSE) market movements. We also introduce a new dataset designed specifically for this problem, capturing important characteristics and patterns that were missing in other datasets. We test our new dataset of headlines and stock market indexes on various machine learning techniques, including Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Linear Support Vector Machine (LSVM), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Bidirectional Long Short-Term Memory (Bi-LSTM), BERT, Financial Bidirectional Encoder Representations from Transformers (FinBERT), and RoBERTa, which are compared to assess their predictive capabilities. Our proposed model achieves 99.83% accuracy on the training set and 99.78% accuracy on the test set, outperforming previous methods.
Keyphrase extraction is a fundamental task in information management, which is often used as a preliminary step in various information retrieval and naturallanguageprocessing tasks. The main contribution of this pap...
详细信息
Keyphrase extraction is a fundamental task in information management, which is often used as a preliminary step in various information retrieval and naturallanguageprocessing tasks. The main contribution of this paper lies in providing a comparative assessment of prominent multilingual unsupervised keyphrase extraction methods that build on statistical (RAKE, YAKE), graph-based (TextRank, SingleRank) and deep learning (KeyBERT) methods. For the experimentations reported in this paper, we employ well-known datasets designed for keyphrase extraction from five different naturallanguages (English, French, Spanish, Portuguese and Polish). We use the F1 score and a partial match evaluation framework, aiming to investigate whether the number of terms of the documents and the language of each dataset affect the accuracy of the selected methods. Our experimental results reveal a set of insights about the suitability of the selected methods in texts of different sizes, as well as the performance of these methods in datasets of different languages.
Part-of-Speech (POS) Tagging is one of the fundamental tasks in naturallanguageprocessing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper P...
Part-of-Speech (POS) Tagging is one of the fundamental tasks in naturallanguageprocessing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper POS for each word. Thus far, most well-researched POS tagging is on European languages which are considered rich-resource languages due to the unlimited linguistic resources such as research studies and large standard corpus. However, POS tagging is arduous for low-resource languages due to the limitation of linguistic resources. The Malay language is considered as a low-resource language. Most POS tagging studies for the Malay language are using rule-based and stochastic methods. However, exploration in Deep Learning (DL) for Malay language is limited. Thus, studies with POS tagging methods that implement DL for other low-resource languages within South East Asia are included in this study. Hence, the aim of this study is to identify the state of the art, challenges, and future works of Malay POS tagger. This study provides a review of different methods, datasets, and performance measures used in POS tagging studies.
The article addresses challenges in human-computer interaction through naturallanguage, particularly in the context of collaborative conversations. The issue of overlapping audio data affects the accuracy of speech r...
详细信息
ISBN:
(纸本)9783031702587;9783031702594
The article addresses challenges in human-computer interaction through naturallanguage, particularly in the context of collaborative conversations. The issue of overlapping audio data affects the accuracy of speech recognition and synthesis systems, especially in scenarios like meetings and negotiations. Emphasis is placed on the need for segregating and clustering speech from multiple speakers, highlighting challenges arising from diverse sound conditions in everyday life. The article then delves into the task of diarization, underscoring the importance of segmenting and processing speech data for effective voice control of devices. Subsequently, it explores the combination of GMM and i-vectors, as well as the evolution of approaches using deep learning, including convolutional and recurrent neural networks. Considering recent trends, the authors analyze the application of Transformers for handling long-term dependencies in data. The concluding section of the article provides a comprehensive overview and analysis of contemporary diarization methods, encompassing algorithms, error evaluation metrics, and descriptions of popular tools, with a focus on more modern approaches. This work constitutes a significant contribution to the field of speech diarization research, covering more current methods and trends compared to previous reviews in this domain.
Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity attributes (including literals), but ignor...
Prior work has uncovered a set of common problems in state-of-the-art context-based question answering (QA) systems: a lack of attention to the context when the latter conflicts with a model's parametric knowledge...
详细信息
ISBN:
(纸本)9798891760882
Prior work has uncovered a set of common problems in state-of-the-art context-based question answering (QA) systems: a lack of attention to the context when the latter conflicts with a model's parametric knowledge, little robustness to noise, and a lack of consistency with their answers. However, most prior work focus on one or two of those problems in isolation, which makes it difficult to see trends across them. We aim to close this gap, by first outlining a set of - previously discussed as well as novel - desiderata for QA models. We then survey relevant analysis and methods papers to provide an overview of the state of the field. The second part of our work presents experiments where we evaluate 15 QA systems on 5 datasets according to all desiderata at once. We find many novel trends, including (1) systems that are less susceptible to noise are not necessarily more consistent with their answers when given irrelevant context;(2) most systems that are more susceptible to noise are more likely to correctly answer according to a context that conflicts with their parametric knowledge;and (3) the combination of conflicting knowledge and noise can reduce system performance by up to 96%. As such, our desiderata help increase our understanding of how these models work and reveal potential avenues for improvements. Code and data can be found here: https://***/Shaier/context_usage_***.
In the recent times, intensifying struggles for sustenance and education, have sharply escalated competitive tensions among individuals, leaving many students especially the adolescents in prolonged states of stress a...
详细信息
ISBN:
(数字)9798331512088
ISBN:
(纸本)9798331512095
In the recent times, intensifying struggles for sustenance and education, have sharply escalated competitive tensions among individuals, leaving many students especially the adolescents in prolonged states of stress and apprehension, due to the lack of proper guidance, trauma or other aspects leading to a significant surge in mental health issues. The recent advancement in social media like Instagram, Facebook, Twitter, etc. has made the communication easy but also has become something that is offering these teenagers a fresh opportunity or method to release or express their emotions which they might otherwise keep to themselves. This research focuses on analyzing adolescents' Twitter posts for detection of depression. Textual data will be harvested from these posts and transformed into input suitable for machine learning algorithms. Different forms of machine learning algorithms are going to be experimented upon, using and not using NLP— naturallanguageprocessing, to design a system for the recognition of patterns of depression in individuals.
System logs are widely used by engineers to record runtime status in the information technology (IT) field. The sequential anomaly detection of logs is crucial for building a secure and stable system and is beneficial...
详细信息
This passage discusses a study that collected Weibo posts from different users related to China's Annual Individual Income Tax Return(AIITR), and compared the performance of six machine learning models (support ve...
详细信息
暂无评论