Imbalanced datasets pose critical challenges in text classification, often leading to biased predictions and reduced model performance. This study compares three undersampling techniques: Random UnderSampler, Near Mis...
详细信息
ISBN:
(数字)9798331517601
ISBN:
(纸本)9798331517618
Imbalanced datasets pose critical challenges in text classification, often leading to biased predictions and reduced model performance. This study compares three undersampling techniques: Random UnderSampler, Near Miss, and Tomek Links, to evaluate their impact on five machine learning classifiers: Logistic Regression, Support Vector Machine, Random Forest, Naive Bayes, and Neural Network. The results demonstrate that the Random Forest classifier achieves the highest performance with precision of $\mathrm{9 7 \%}$, recall of $\mathrm{9 9 \%}$, and F1-score of $\mathrm{9 8 \%}$ across all methods, showcasing its robustness against data imbalance. The Near Miss technique improves Neural Network performance to a recall of $83 \%$ and F1-score of $\mathrm{7 1 \%}$, while Tomek Links further boosts Neural Network precision to $\mathrm{6 8 \%}$, recall to $\mathrm{7 2 \%}$, and F1-score to $\mathrm{7 0 \%}$. In contrast, simpler models like Naive Bayes exhibit limited improvements, with precision and recall scores around 47-51%. These findings underscore the importance of selecting appropriate undersampling strategies to enhance text classification outcomes and address data imbalance effectively.
In this paper, a novel dual-band MIMO monopole antenna for 5G applications is presented. The proposed MIMO antenna is made up of two rings that are T-shaped, generating two distinguished bands: the N77 band and the 6 ...
详细信息
We investigate the effect of the well-known Mycielski construction on the Shannon capacity of graphs and on one of its most prominent upper bounds, the (complementary) Lovász theta number. We prove that if the Sh...
详细信息
On social media, there is a lot of information that can be used for further analysis, such as text data, including posts, comments, and reviews. The data has unstructured properties. The problem in this study is the d...
On social media, there is a lot of information that can be used for further analysis, such as text data, including posts, comments, and reviews. The data has unstructured properties. The problem in this study is the difficulty of analyzing text on social media data to determine standard and non-standard words, so this research utilizes an approach from word representation with TF-IDF, Word2Vec, and transfer learning and using the CNN algorithm. This study produced 3 models in classification performance. In the evaluation model 3 obtained better accuracy compared to model 1 and model 2 because the number of epochs and batch sizes were different. For model 3, the accuracy obtained based on batch values of 64 and epoch values of 20 is 82% for the TF-IDF-CNN model, while the second model uses Word2Vec-CNN with batch values of 128 and epoch values of 60 to obtain 86% accuracy for the Transfer Learning-CNN model with batch values of 256 and epoch values of 100, the accuracy obtained is 91%.
Cerebral Microbleeds(CMBs)are microhemorrhages caused by certain abnormalities of brain *** can be found in people with Traumatic Brain Injury(TBI),Alzheimer’s disease,and in old individuals having a brain *** resear...
详细信息
Cerebral Microbleeds(CMBs)are microhemorrhages caused by certain abnormalities of brain *** can be found in people with Traumatic Brain Injury(TBI),Alzheimer’s disease,and in old individuals having a brain *** research reveals that CMBs can be highly dangerous for individuals having dementia and *** CMBs seriously impact individuals’life which makes it crucial to recognize the CMBs in its initial phase to stop deterioration and to assist individuals to have a normal *** existing work report good results but often ignores false-positive’s perspective for this research *** this paper,an efficient approach is presented to detect CMBs from the Susceptibility Weighted Images(SWI).The proposed framework consists of four main phases(i)making clusters of brain Magnetic Resonance Imaging(MRI)using k-mean classifier(ii)reduce false positives for better classification results(iii)discriminative feature extraction specific to CMBs(iv)classification using a five layers convolutional neural network(CNN).The proposed method is evaluated on a public dataset available for 20 *** proposed system shows an accuracy of 98.9%and a 1.1%false-positive rate *** results show the superiority of the proposed work as compared to existing states of the art methods.
Addressing its growing number and vital role, decentralization of cloud computing becoming a necessity. Fog computing aims to bring application closer to the data source-typically at the network’s edge by leveraging ...
Addressing its growing number and vital role, decentralization of cloud computing becoming a necessity. Fog computing aims to bring application closer to the data source-typically at the network’s edge by leveraging local resources to provide faster data processing and decision-making. Fog computing then has to place application strategically to use its limited fog resource to improve application performance metrics. This problem known as Fog Application Placement Problem (FAPP) has been approached using previous methods that rely on rules and prior knowledge that may not be adaptive, but rather being overly specialized to specific problems. Deep learning with its learning mechanism, can offer more adaptable and dynamic solutions for a wide range of scenarios, especially in fog network that continuously evolve. This research investigate Seq2seq placement model inherent limitations, notably the impracticality of generating every possible pattern from all potential request configurations. We aim to address and answer the following critical questions: 1) How does the model’s performance vary when confronted with unseen requests or an augmented number of modules, especially considering the limitations in training data?; 2) Can the seq2seq model, even with its training limitations, adhere to the heuristic rules of the dataset when dealing with unfamiliar problems? This research shows that in similar availability, there is a reduction of almost half of the response time with 183.03 ms, 2.93 number of hops, and 0.87 megabyte of transmitted messages against the hop3 algorithm. Moreover, we highlight the ability of seq2seq model to follow heuristic rules in unseen scenarios.
This study presents a comprehensive comparative analysis of the effectiveness of word-level and character-level embeddings in the context of machine learning-based detection of malicious URLs and DGA-generated domains...
This study presents a comprehensive comparative analysis of the effectiveness of word-level and character-level embeddings in the context of machine learning-based detection of malicious URLs and DGA-generated domains. Utilizing distinct datasets comprising DGA-generated domains and Spam URLs, we systematically evaluate various machine learning models coupled with word-level and character-level tokenization techniques. Our findings indicate that character-level tokenization yields superior results in identifying DGA-generated domains, particularly due to the random character composition of these URLs. Conversely, both word-level and character-level embeddings exhibit comparable success rates in classifying Spam URLs, owing to the non-random nature of their URL structures. The study sheds light on the importance of tailoring tokenization strategies based on the unique characteristics of the data. We recommend character-level embeddings for detecting DGA-generated domains characterized by random characters. In contrast, the choice between word-level and character-level embeddings is less critical when dealing with Spam URLs, as both approaches yield effective results.
Data Cloud Storage is a service that allows data to save data on offsite storage system and managed by third-party and is made accessible by a web services user. It has become one of the most convenient and efficient ...
Data Cloud Storage is a service that allows data to save data on offsite storage system and managed by third-party and is made accessible by a web services user. It has become one of the most convenient and efficient methods to store data online. However, like other technologies and services, cloud storage has its advantages and disadvantages, but the challenge of security is the big issue. In this project, an investigation of the use of blockchain technology to improve the security of cloud data storage. The focus to address in this project is the vulnerability of cloud data storage systems to cyber-attacks and data breaches. The design and implementation of a proof-of-concept blockchain-based data storage system, and an evaluation of the system’s security and performance will be presented. A proposal system to secure cloud storage could be an alternative for traditional methods. It is proposed to use blockchain as a data protection system.
Typing errors are a behavior that often occurs in communication via short messages or posts on social media platforms. In communicating on social media, many individuals without realizing it often make typing errors t...
Typing errors are a behavior that often occurs in communication via short messages or posts on social media platforms. In communicating on social media, many individuals without realizing it often make typing errors that can affect the meaning of sentences. This research will apply a model that can check typing errors using Bi-Directional LSTM and transfer learning which results in the Bi-Directional LSTM model achieving an accuracy of 78% after being trained for 20 epochs with a batch size of 64. On the other hand, the transfer learning model achieves an accuracy of 80% with the same epoch and batch size settings. Furthermore, when the Bi-Directional LSTM model was set with 20 epochs and batch size 64, the accuracy increased to 83%, while the transfer learning model achieved 87% accuracy. Finally, with 20 epochs and a batch size of 64, the Bi-Directional LSTM model achieved 87% accuracy, while the transfer learning model achieved an impressive accuracy of 93%.
The field of dermatology faces considerable challenges when it comes to early detection of skin cancer. Our study focused on using different datasets, including original data, augmented data, and SMOTE oversampled dat...
The field of dermatology faces considerable challenges when it comes to early detection of skin cancer. Our study focused on using different datasets, including original data, augmented data, and SMOTE oversampled data, to identify skin cancer. Our dataset consisted of images of skin lesions from the MNIST Skin Cancer dataset (HAM 10000), including samples of both cancerous and benign cases in the dataset. We employed data augmentation to expand the dataset’s size and increase the diversity of skin lesion features. Furthermore, to tackle class imbalance in the dataset, we applied the SMOTE oversampling technique to generate synthetic samples for the under-represented group. With the original, augmented, and SMOTE oversampled datasets, we trained a Convolutional Neural Network (CNN) model. The performance of the model was evaluated using accuracy, recall, precision, and F1-score. The comparison between the results obtained from the original data, augmented data, and SMOTE oversampling data clearly revealed distinctions in performance. Our findings clearly demonstrate that employing data augmentation and SMOTE oversampling can significantly enhance the efficacy of skin cancer detection.
暂无评论