检索结果-内蒙古大学图书馆

Unstructured Big Data Threat Intelligence Parallel Mining algorithm

Big Data Mining and Analytics 2024年第2期7卷 531-546页

作者： Zhihua Li Xinye Yu Tao Wei Junhao Qian School of Artificial Intelligence and Computer Science Jiangnan UniversityWuxi 214122China School of IoT Engineering Jiangnan UniversityWuxi 214122China

To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based multi-label classification(PDFMLC)***,open-source cybersecurity analysis reports are collected and converted into a standardized text ***,five tactics category labels are annotated,creating a multi-label dataset for tactics *** the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ***,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input *** captures latent label associations,significantly improving classification ***,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)*** results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution ***,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat *** a result,successfully formatted STIX2.1 threat intelligence is established.

关键词： unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence

来源：评论

学校读者我要写书评

暂无评论

Imbalanced Networked multi-label classification with Active Learning 9

Imbalanced Networked Multi-Label Classification with Active ...

引用

9th IEEE International Conference on Big Knowledge (ICBK)

作者： Zhang, Ruilong Li, Lei Zhang, Yuhong Bu, Chenyang City Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei 230601 Anhui Peoples R China

ISBN: (纸本)9781538691250

With the rapid development of social networks, the networked multi-label classification algorithms have gained wide attention. The existing networked multi-label classification algorithms mostly only consider the homogeneity or heterogeneity of the network without taking the imbalance of the network into account, and this is actually pretty common in real network environments, which deserves more attention. Moreover, the selection strategy of training set is very critical for multi-label classification algorithm, because it will directly affect both the parameter updating inside the classifier and the precision of the classifier. The application of active learning to the selection of training set can effectively improve the precision of the classifier. Similarly, the application of imbalanced data processing strategies to the selection of training sets also makes classifiers more suitable for imbalanced data networks. Thereout, we propose an algorithm BSHD (Block Sampling with selecting the Highest Degree nodes), which is an active learning based imbalanced networked multi-label classification algorithm. In this algorithm, we divide the network according to the edge density and utilize the oversampling and undersampling to dispose each block. Then we select the nodes with the highest degree from each block to form the training set. Experimental results show that our proposed BSHD outperforms other state-of-arts approaches.

关键词： imbalanced data active learning multi-label classification algorithm oversampling undersampling

来源：评论

学校读者我要写书评

暂无评论

An Improved ML-kNN multi-label classification Model Based on Feature Dimensionality Reduction

An Improved ML-kNN Multi-label Classification Model Based on...

引用

2016 International Conference on Computer, Mechatronics and Electronic Engineering (CMEE 2016)

作者： Zhi-qiang Li Shuai-yi Cao Hong-chen Guo School of Software Beijing Institute of Technology Network Information Technology Center Beijing Institute of Technology

ML-kNN cannot be used in real-time classification because of the huge computational cost, time and space resources. Therefore, this paper proposes a LDA-ML-kNN multi-label classification model based on feature dimensionality reduction. Firstly, we use LDA to extract the features of text data, and reduce the dimension of the extracted data matrix, then use data to train the classifiers and add the correlation between the labels. Finally, get the multi-label classification results. In this paper, the experiment results show that the improved model has a significant improvement in the reduction of computational complexity. When the model has a certain improvement in the average accuracy rate, the compression time and space complexity are greatly reduced. And this model has great practical significance for real-time classification and processing of massive data.

关键词： multi-label classification algorithm Feature reduction LDA label relevance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：