With the advent of the information era, we have seen a huge boom in the amount of data produced over the years, which is primarily the result of the Internet and its billions of users worldwide. The internet is a stor...
详细信息
ISBN:
(纸本)9781728138077
With the advent of the information era, we have seen a huge boom in the amount of data produced over the years, which is primarily the result of the Internet and its billions of users worldwide. The internet is a storehouse of all kinds of data text, videos, and images. But most of this data is not suitable for learning algorithms directly. There is a need for the processing of the data prior to being applied to various learning algorithms. Deep learning algorithms using neural networks require efficient and proper datasets to yield better results for predictive analysis. As we need to deal with big data over the internet, efficient storage is a challenge. Encoding the dataset in a convenient and efficient form and then storing it is of immense importance. Here, we compare various encoding algorithms to store pre-processed text data. Using Huffman Encoding, the simulation results, for a random sample of 8000 English words, have indicated that the storage space (memory) requirement dropped to just 0.1% in comparison with the more traditional One-Hot encoding technique.
The excessive consumption of network bandwidth for transmitting unwanted emails has always been a major problem in the web, since, the existing classification approaches are still lacking for a complete solution. This...
详细信息
ISBN:
(纸本)9781467384377
The excessive consumption of network bandwidth for transmitting unwanted emails has always been a major problem in the web, since, the existing classification approaches are still lacking for a complete solution. This paper presents an enhanced vocabulary-based dictionary algorithm for protecting web user by receiving unwanted spam mails. The proposed algorithm identifies and classifies legitimate incoming mails against unsolicited email attacks. We present a porter stemmer algorithm as a part of normalization process for removing the common morphological and inflexional endings from English words. A comparative study and evaluation of these classification approaches are carried out using machine-learning techniques. The performance of the proposed algorithm is visualized using confusion matrix. The experimental results show that our method produces less number of false negatives when compared with existing techniques.
暂无评论