检索结果-内蒙古大学图书馆

31st ACM International Conference on Information and Knowledge Management (CIKM)

作者： Lai, Kaiting Long, Yinong Wu, Bowen Li, Ying Wang, Baoxun Peking Univ Beijing Peoples R China Tencent Platform & Content Grp Beijing Peoples R China Peking Univ Natl Res Ctr Software Engn Beijing Peoples R China

ISBN: (纸本)9781450392365

Chinese spam text detection is essential for social media since these texts affect the user experience of Chinese speakers and pollute the community. The underlying text classification method is employed to explore the unique combinations of characters that represent clues of spam information from annotated or further augmented data. However, based on the diversity of Chinese characters in glyphs, the spammers frequently wrap the spam content in another visually close text to fool the model but make sure people understand1. This paper proposes to adopt the essence of human cognition of these adversarial texts into spam text detection models, by designing a pre-trained model to learn the morphology semantics of Chinese characters and represent their contextual meanings from scratch. The model pre-trains on self-supervised Chinese corpus and fine-tunes on spam-annotated community texts. Besides, cooperating with the pre-trained model that can capture the morphological features of Chinese, a new data perturbation method is introduced to guide the optimization towards the direction of recognizing the actual meaning of a text after spammers tamper with partial characters by visually close ones. The experimental results have shown that our proposed methodology can notably improve the performance of spam text detection as well as maintain robustness against adversarial samples.

关键词： Spam Text Detection Chinese Character Representation Pre-trained Language Model classification algorithm

来源：评论

学校读者我要写书评

暂无评论

An Extraction Method of Hyponymy based on Multiple Data Sources Fusion 7

An Extraction Method of Hyponymy based on Multiple Data Sour...

引用

7th IEEE International Conference on Software Engineering and Service Science (ICSESS)

作者： Wang, Shuqi Feng, Xiao Zhang, Shuwu Yin, Chengcheng Commun Univ China Dearpartment Signal & Informat Syst Beijing Peoples R China Chinese Acad Sci Inst Automat Beijing Peoples R China

ISBN: (纸本)9781467399043

Hyponymy is one of the most critical semantic relations, which contributes magnificently to semantic dictionary, information retrieval etc. In this paper, a method of extracting hyponymy is proposed based on multiple data sources fusion, which convert the extraction of hyponymy to the extraction of hypernyms for target words. First, mining candidate hypernyms for the target words based on search engine, encyclopedia resources and core suffix words. Second, fusing the candidates from the above data sources. At last, the classification algorithm is used to filter the noise and extract the hypernyms, which is a quite mature machine learning algorithm. There is hyponymy between the target words and their correctly extracted hypernyms. The experimental results show that the highest accuracy rate of hyponymy extraction reaches 0.832 using the proposed method.

关键词： hyponymy multiple data dources fusion mining candidate hypernyms classification algorithm hypernym extraction

来源：评论

学校读者我要写书评

暂无评论

Interest Mining algorithm based on Blog Information

Interest Mining Algorithm based on Blog Information

引用

3rd International Conference on Applied Mechanics, Materials and Manufacturing (ICAMMM 2013)

作者： Ou, Guohua Xu, Changjian Zhan, Haoxun Qin, Yong Huang, Han South China Univ Technol Sch Software Engn Guang Zhou Guang Dong Peoples R China Dongguan Univ Technol Sch Comp Dong Guan Guangdong Peoples R China

ISBN: (纸本)9783037858882

Recently, the amount of blogs on the Internet rises sharply. Hence, mining valuable information in blogs possesses realistic significance for improving user experience, network services, etc. This paper proposes a mining algorithm with blog authors' interests based on classification techniques, which introduces an evaluation standard of non-empty intersection. This algorithm can also improve the hit ratio of recommendation service based on blog authors' interests by means of the interest collection from expanding prediction;therefore, it can reach a higher degree of satisfaction. In addition, this paper performs experiments with the data set from Sina Blog and NetEase Blog, whose result illustrates the higher accuracy of our algorithm.

关键词： Blog mining Interest mining classification algorithm Non-empty intersection

来源：评论

学校读者我要写书评

暂无评论

Efficient Hardware Implementation of Decision Tree Training Accelerator 6

Efficient Hardware Implementation of Decision Tree Training ...

引用

6th IEEE International Symposium on Smart Electronic Systems (IEEE-iSES)

作者： Choudhury, Rituparna Ahamed, S. R. Guha, Prithwijit Indian Inst Technol Dept Elect & Elect Engn Gauhati India

ISBN: (纸本)9781665404785

This paper proposes a serial architecture as implementation for Two Means Decision Tree. This DT algorithm exhibits lower complexity. This architecture is implemented on Field Programmable Gate Array (FPGA) running at 62 MHz. Simulation results show that the proposed hardware architecture exhibits 10x speed-up as compared to its software implementation and runs 28x faster than C4.5 algorithm. It also consumes less power as compared to complex algorithms implemented On Graphical Processor Units (GPU). Hence the architecture is suitable for simple low power high speed applications.

关键词： Machine learning Decision Tree FPGA Serial Architecture classification algorithm

来源：评论

学校读者我要写书评

暂无评论

Rumor Detection of Sina Weibo Based on SDSMOTE and Feature Selection 4

Rumor Detection of Sina Weibo Based on SDSMOTE and Feature S...

引用

4th IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)

作者： Geng, Yixin Sui, Jie Zhu, Qian Univ Chinese Acad Sci Sch Engn Sci Beijing Peoples R China

ISBN: (纸本)9781728114101

Internet provides great convenience for our life and becomes an important channel to get information. However, a large amount of false information, called rumors, come with it. In terms of automatically detecting rumors, two main contributions of this paper are as follows: (1) To reduce the impact of unbalanced data on classification, we proposed an improvement SMOTE algorithm to resample data. (2) We proposed six new features based on Sina microblogs, including Words with Guidance (WG), Words with Menace (WM), Suspected Topic (ST), Recognition of Information (RI), Degree of Attention to Users (DAU) and Credit Rating (CR), which are related to user-based features, content-based features, propagation-based features and microblog-based features. By building subsets with new features and using machine learning algorithms including Xgboost etc. We tested the effect of rumor detection on a real data set. Experiments showed that our rumor detection method was significantly improved compared with the most advanced method of the same type, with precision, recall and F1 at 0.827, 0.837 and 0.825 respectively, and AUC at 0.895.

关键词： rumor detection SDSMOTE feature selection social network microblog machine learning classification algorithm

来源：评论

学校读者我要写书评

暂无评论

Predicting IT Employability Using Data Mining Techniques 3

Predicting IT Employability Using Data Mining Techniques

引用

3rd International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC)

作者： Piad, Keno C. Dumlao, Menchita Ballera, Melvin A. Ambat, Shaneth C. Bulacan State Univ Sch Comp Studies Malolos Philippines AMA Univ Sch Grad Studies Quezon City Philippines AMA Univ Sch Comp Studies Quezon City Philippines

ISBN: (纸本)9781467393799

Researchers in higher education are beginning to explore the potential of data mining in analyzing data for the purpose of giving quality service and needs of their graduates. Thus, educational data mining emerges as one tools to study academic data to identify patterns and help for decision making affecting the education. This paper predicts the employability of IT graduates using nine variables. First, different classification algorithms in data mining were tested making logistic regression with accuracy of 78.4 is implemented. Based on logistic regression analysis, three academic variables directly affect;IT_Core, IT_Professional and Gender identified as significant predictors for employability. The data were collected based on the five year profiles of 515 students randomly selected at the placement office tracer study.

关键词： decision tree classification algorithm employability prediction analytics data accuracy

来源：评论

学校读者我要写书评

暂无评论

Financial customer classification by combined model

引用

APPLIED MATHEMATICS AND NONLINEAR SCIENCES 2022年第2期8卷 431-446页

作者： Lin, Cong Zheng, Jinju Ningbo Univ Technol Ningbo 315000 Zhejiang Peoples R China

This paper explores the pros and cons of different algorithm models on the same selection problem, and then uses the combined prediction theory to obtain a new combined prediction model to explore its prediction accuracy. The actual problem to be solved is to help financial institutions to scientifically classify customers who choose financial products. We select the bank data set in the UCI database, which is derived from the survey data of a customer conducted by a financial institution in Portugal for a wealth management product. Decision tree C5.0 algorithm, naive Bayes classification algorithm and binary logit model are individually used to carry out a single model of empirical research on financial product customer classification. Through the empirical analysis of the five combination models, it is concluded that in the model that uses the least squares weighting method to determine the weight, the weight appears negative, which does not conform to the actual situation. The model that is based on the least squares weighting method and the model that is based on the simple weighting method are excluded. In contrast, the arithmetic mean weighted model is better than the reciprocal variance weighted model and the reciprocal mean square model. The accuracy reaches 89.91%, which is 0.43% higher than the accuracy of a single model. It can be concluded that the model that is based on the arithmetic average weighting is a better combination forecasting model.

关键词： combined forecasting classification algorithm financial products

来源：评论

学校读者我要写书评

暂无评论

Method of Physiological State Abnormality Discrimination Based on Improved KNN algorithm 10

Method of Physiological State Abnormality Discrimination Bas...

引用

10th International Conference on Big Data and Information Analytics

作者： Miao Yongfei Li Xiaolong Wu Shikai Zhao Xiaoqing Yang Yufeng Yu Yue Air Force Commun NCO Acad Dept Ground Air Nav Dalian Peoples R China

ISBN: (数字)9798350354621

ISBN: (纸本)9798350354638;9798350354621

Physiological state abnormality due to genetic diseases, excessive exercise, etc. is becoming a fatal killer endangering people's life and safety because of its hidden characteristics. K-Nearest Neighbor(KNN) algorithm is widely used in various fields due to its simple implementation, but when the sample capacity is too large or the feature attributes are too many, the classification efficiency decreases significantly. This paper proposed an improved KNN(IKNN) algorithm based on clustering by hierarchically clustering the data in data pre-processing, which reduced the search space of the algorithm and effectively improved the search efficiency. When the improved KNN algorithm was applied in the physiological state abnormality discrimination field, which better improved the efficiency and accuracy of physiological abnormality discrimination. Results show that this could provide an effective guarantee for the early discovery of physiological parameter abnormality symptom, the timely adoption of dispositive measures, and the maintenance of people's life safety.

关键词： Physiological Abnormality classification algorithm Hierarchical Clustering KNN algorithm

来源：评论

学校读者我要写书评

暂无评论

Applied Analysis of Social Network Data in Personal Credit Evaluation 7th

Applied Analysis of Social Network Data in Personal Credit E...

引用

7th International Conference on Artificial Intelligence and Mobile Services (AIMS) Held as Part of the Services Conference Federation (SCF)

作者： Wang, Yanyong Yang, Jian Saifuding, Daniyaer Fan, Jiejie Li, Ranran Zhao, Chongchong Xu, Jie Xing, Chunxiao Univ Sci & Technol Sch Comp & Commun Engn Beijing 10083 Peoples R China Tsinghua Univ Dept Comp Sci & Technol Beijing 100084 Peoples R China

ISBN: (纸本)9783319943619;9783319943602

With the development of the times, the traditional personal credit is facing a severe test. This paper makes an exploratory study on the practical application and development of personal credit evaluation by using the MicroBlog data. According to the previous study of personal credit evaluation literature to dig out the credit-related indicators. We summed up the three major attributes of "Attributes of Demographic", "Tweets Content", and "User Relationship Structure". We use support vector machine (SVM), naive Bayesian (NB), logical regression (LR) and AdaBoost classification algorithm, according to the actual problem modeling, to analysis of social network data on personal credit. Compared with other algorithms, the AUC value of AdaBoost algorithm achieves the best effect with 0.564 under the equalization setting.

关键词： Social data classification algorithm AdaBoost Personal credit assessment

来源：评论

学校读者我要写书评

暂无评论

Study on Earthquake Prediction Model Based on Traffic Disaster Data 11

Study on Earthquake Prediction Model Based on Traffic Disast...

引用

11th IEEE International Conference on Software Engineering and Service Science (IEEE ICSESS)

作者： Han, Wanjiang Gan, Yuanlin Chen, Shuwen Wang, Xiaoxiang Beijing Univ Posts & Telecommun Sch Comp Sci Natl Pilot Software Engn Sch Beijing Peoples R China Beijing Univ Posts & Telecommun Sch Informat & Commun Engn Beijing Peoples R China

ISBN: (纸本)9781728165790

This paper collects data on the damage to the traffic system caused by earthquakes in China in the past two decades, and uses KNN algorithm, SVM algorithm, logistic regression algorithm, naive Bayes algorithm and decision tree algorithm to train the data, then establish earthquake prediction models. The paper introduces the process of preprocessing, modelling, evaluation, and visualization of disaster data. An earthquake disaster inversion model based on traffic data has been established, which can predict the earthquake intensity based on the relevant data provided by the traffic department. The prediction accuracy is relatively accurate, which is very helpful for earthquake prediction and rescue operations.

关键词： Traffic loss earthquake intensity earthquake prediction model classification algorithm machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：