An imbalanced dataset often challenges machine learning, particularly classification methods. Underrepresented minority classes can result in biased and inaccurate models. The Synthetic Minority Over-Sampling Techniqu...
详细信息
An imbalanced dataset often challenges machine learning, particularly classification methods. Underrepresented minority classes can result in biased and inaccurate models. The Synthetic Minority Over-Sampling Technique (SMOTE) was developed to address the problem of imbalanced data. Over time, several weaknesses of the SMOTE method have been identified in generating synthetic minority class data, such as overlapping, noise, and small disjuncts. However, these studies generally focus on only one of SMOTE’s weaknesses: noise or overlapping. Therefore, this study addresses both issues simultaneously by tackling noise and overlapping in SMOTE-generated data. This study proposes a combined approach of filtering, clustering, and distance modification to reduce noise and overlapping produced by SMOTE. Filtering removes minority class data (noise) located in majority class regions, with the k-nn method applied for filtering. The use of Noise Reduction (NR), which removes data that is considered noise before applying SMOTE, has a positive impact in overcoming data imbalance. Clustering establishes decision boundaries by partitioning data into clusters, allowing SMOTE with modified distance metrics to generate minority class data within each cluster. This SMOTE clustering and distance modification approach aims to minimize overlap in synthetic minority data that could introduce noise. The proposed method is called “NR-Clustering SMOTE,” which has several stages in balancing data: (1) filtering by removing minority classes close to majority classes (data noise) using the k-nn method;(2) clustering data using K-means aims to establish decision boundaries by partitioning data into several clusters;(3) applying SMOTE oversampling with Manhattan distance within each cluster. Test results indicate that the proposed NR-Clustering SMOTE method achieves the best performance across all evaluation metrics for classification methods such as Random Forest, SVM, and Naїve Bayes, compared t
Water is an important substance for the human body. Clean water is important for not just the human body, but also for the environment. In this paper, Prisma is used to filter many of the reference paper where the pap...
详细信息
Data Augmentation (DA) is an effective strategy to increase model generalisation. In Natural Language Processing (NLP), DA remains in its early stages, primarily due to the inherent sensitivity of textual data, which ...
详细信息
The study proposes and describes a novel system called the Health Data Exchange. HDX is useful for identifying relevant experts in healthcare based on specific skill sets and their cohorts. Initial experience suggests...
详细信息
The study proposes and describes a novel system called the Health Data Exchange. HDX is useful for identifying relevant experts in healthcare based on specific skill sets and their cohorts. Initial experience suggests that an expert database based on biographical data can be developed. Several techniques for automatically identifying the critical data, data normalization, storage, and retrieval are described that supported the development of the system. Such a system has the potential to facilitate efficient search and foster global collaboration among healthcare experts. 87 Annual Meeting of the Association for Information science & Technology | Oct. 25 – 29, 2024 | Calgary, AB, Canada.
We provide several novel algorithms and lower bounds in central settings of mixed-integer (non-)linear optimization, shedding new light on classic results in the field. This includes an improvement on record running t...
详细信息
Automatic Speech Recognition (ASR) is useful for converting speech into text. ASR is needed to display automatic subtitles on movies or when conducting video conferencing. The use of deep learning in ASR applications ...
详细信息
This research focuses on the review of Fintech and its development on the IoT Platform and also the risks that can be posed to the IoT network used. Finance is the most essential side of several other sectors which in...
详细信息
The integration of drone technology with 5G networks presents novel opportunities for enhancing wireless communication systems. This paper explores the application of beamforming optimization techniques in dynamic env...
详细信息
Data protection in databases is critical for any organization,as unauthorized access or manipulation can have severe negative *** detection systems are essential for keeping databases *** in technology will lead to si...
详细信息
Data protection in databases is critical for any organization,as unauthorized access or manipulation can have severe negative *** detection systems are essential for keeping databases *** in technology will lead to significant changes in the medical field,improving healthcare services through real-time information ***,reliability and consistency still need to be *** against cyber-attacks are necessary due to the risk of unauthorized access to sensitive information and potential data ***-ruptions to data items can propagate throughout the database,making it crucial to reverse fraudulent transactions without delay,especially in the healthcare industry,where real-time data access is *** research presents a role-based access control architecture for an anomaly detection ***,the Structured Query Language(SQL)queries are stored in a new data structure called *** pentaplets allow us to maintain the correlation between SQL statements within the same transaction by employing the transaction-log entry information,thereby increasing detection accuracy,particularly for individuals within the company exhibiting unusual *** identify anomalous queries,this system employs a supervised machine learning technique called Support Vector Machine(SVM).According to experimental findings,the proposed model performed well in terms of detection accuracy,achieving 99.92%through SVM with One Hot Encoding and Principal Component Analysis(PCA).
With the development of 5G and the Internet of Vehicles, diverse in-vehicle services continue to emerge. Computation-intensive and delay-sensitive in-vehicle tasks pose significant challenges to in-vehicle devices and...
详细信息
暂无评论