In this paper, we enhance and customize the existing BERTopic framework to develop and implement an automated pipeline that delivers a more coherent and diverse set of topics with an even moderate dataset. More specif...
详细信息
In this paper, a novel classification algorithm that is based on data Importance (DI) reformatting and Genetic Algorithms (GA) named GADIC is proposed to overcome the issues related to the nature of data which may hin...
详细信息
Like air pollution, sound pollution has grown to be a major concern for city residents, designers, and developers. Detecting and recognizing sound types and sources in cities and suburban areas or any environment have...
详细信息
Pipeline networks are crucial for process industries transportation and business operation. These vital elements, however, are highly exposed to material degradation due to corrosion that seriously impedes operational...
详细信息
In this paper, we enhance and customize the existing BERTopic framework to develop and implement an automated pipeline that delivers a more coherent and diverse set of topics with an even moderate dataset. More specif...
详细信息
In this paper, we enhance and customize the existing BERTopic framework to develop and implement an automated pipeline that delivers a more coherent and diverse set of topics with an even moderate dataset. More specifically, the contributions of this work are threefold: (1) integrate a dynamic and advanced optimizer into the existing BERTopic framework to learn the optimal number of dimensions of different document embeddings, (2) develop a k-means-based algorithm in the optimizer to support the dimension-embedding learning, and (3) conduct an extensive experimental study on three distinct types of datasets, including DBPedia, AG News, and Reuters, to evaluate the performance of our approach in terms of the topic quality (TQ) score computed by the topic coherence and the topic diversity. From the results, we can conclude that our enhanced, automated BERTopic framework with its dimension-embedding learning algorithm on documents outperforms the TQ score of the existing framework by 4.49% (before removing the stop words) and 16.52% (after removing the stop words) among all the four representable document-embedding approaches, including the BERTopic's Default Sentence Transformer, Google's Universal Sentence Encoder, OpenAI GPT-2, and our investigators' developed Context-aware Embedding Model, on all the three datasets.
COVID-19 is a global pandemic that hit the world in 2019-2020 and caused massive losses. Every day, hundreds of thousands of tests are being done on possible infected cases. It usually takes several hours to get the r...
详细信息
There is a rise in car accidents due to human errors on the road. A critical task of self-driving cars that can reduce accidents on the road is traffic sign detection and recognition (TSDR), which is vital in alerting...
详细信息
ISBN:
(数字)9798350351767
ISBN:
(纸本)9798350351774
There is a rise in car accidents due to human errors on the road. A critical task of self-driving cars that can reduce accidents on the road is traffic sign detection and recognition (TSDR), which is vital in alerting drivers to the presence of traffic signs in advance. This research will separate the proposed deep ensemble learning algorithm into two methods. First, after the traffic scene process, the algorithm will detect the traffic sign as two categories with the YOLOv5s network. Then, process the traffic sign to recognize the traffic sign into seven classes with the MobileNet network. The detection model was trained with the Taiwan Traffic Sign Detection (TTSD) dataset collected from Taiwan roads. The recognition model was trained with the Taiwan Traffic Sign Recognition (TTSR) dataset. The result of the proposed algorithm showed high performance when experimenting with 95.83% accuracy, 87.34% true prediction, and 191.3 milliseconds (ms) of inference time.
Nowadays IoT sensors become one of the major sources of data, starting from home automation going to daily devices. Focusing on handsets that have the most significant contribution factor on data blooming and booming....
详细信息
The aim of this paper is to investigate the impact of social distance on people during COVID-19 pandemic using twitter sentiment analysis through a comparison between the k-means clustering and Mini-Batch k-means clus...
详细信息
Stock prices prediction is one of the most daunting tasks to achieve for day traders, investors, and data scientists. They are complex functions of a wide array of contributing factors that affects the movement dynami...
详细信息
暂无评论