To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based multi-labelclassification(PDFMLC)***,open-source c...
详细信息
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based multi-labelclassification(PDFMLC)***,open-source cybersecurity analysis reports are collected and converted into a standardized text ***,five tactics category labels are annotated,creating a multi-label dataset for tactics *** the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ***,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input *** captures latent label associations,significantly improving classification ***,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)*** results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution ***,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat *** a result,successfully formatted STIX2.1 threat intelligence is established.
With the rapid development of social networks, the networked multi-label classification algorithms have gained wide attention. The existing networked multi-label classification algorithms mostly only consider the homo...
详细信息
ISBN:
(纸本)9781538691250
With the rapid development of social networks, the networked multi-label classification algorithms have gained wide attention. The existing networked multi-label classification algorithms mostly only consider the homogeneity or heterogeneity of the network without taking the imbalance of the network into account, and this is actually pretty common in real network environments, which deserves more attention. Moreover, the selection strategy of training set is very critical for multi-label classification algorithm, because it will directly affect both the parameter updating inside the classifier and the precision of the classifier. The application of active learning to the selection of training set can effectively improve the precision of the classifier. Similarly, the application of imbalanced data processing strategies to the selection of training sets also makes classifiers more suitable for imbalanced data networks. Thereout, we propose an algorithm BSHD (Block Sampling with selecting the Highest Degree nodes), which is an active learning based imbalanced networked multi-label classification algorithm. In this algorithm, we divide the network according to the edge density and utilize the oversampling and undersampling to dispose each block. Then we select the nodes with the highest degree from each block to form the training set. Experimental results show that our proposed BSHD outperforms other state-of-arts approaches.
ML-kNN cannot be used in real-time classification because of the huge computational cost, time and space resources. Therefore, this paper proposes a LDA-ML-kNN multi-labelclassification model based on feature dimensi...
详细信息
ML-kNN cannot be used in real-time classification because of the huge computational cost, time and space resources. Therefore, this paper proposes a LDA-ML-kNN multi-labelclassification model based on feature dimensionality reduction. Firstly, we use LDA to extract the features of text data, and reduce the dimension of the extracted data matrix, then use data to train the classifiers and add the correlation between the labels. Finally, get the multi-labelclassification results. In this paper, the experiment results show that the improved model has a significant improvement in the reduction of computational complexity. When the model has a certain improvement in the average accuracy rate, the compression time and space complexity are greatly reduced. And this model has great practical significance for real-time classification and processing of massive data.
暂无评论