检索结果-内蒙古大学图书馆

arXiv 2018年

作者： Lei, Kai Zhang, Bing Deng, Yang Zhang, Dongyu Shen, Ying Institute of Big Data Technologies Shenzhen Key Lab for Cloud Computing Technology & Applications Peking University SHENZHENP.R518055 China C.S. Depart. Harbin Institute of Technology

Named entity discovery and linking is the fundamental and core component of question answering. In Question Entity Discovery and Linking (QEDL) problem, traditional methods are challenged because multiple entities in one short question are difficult to be discovered entirely and the incomplete information in short text makes entity linking hard to implement. To overcome these difficulties, we proposed a knowledge graph based solution for QEDL and developed a system consists of Question Entity Discovery (QED) module and Entity Linking (EL) module. The method of QED module is a tradeoff and ensemble of two methods. One is the method based on knowledge graph retrieval, which could extract more entities in questions and guarantee the recall rate, the other is the method based on Conditional Random Field (CRF), which improves the precision rate. The EL module is treated as a ranking problem and Learning to Rank (LTR) method with features such as semantic similarity, text similarity and entity popularity is utilized to extract and make full use of the information in short texts. On the official dataset of a shared QEDL evaluation task, our approach could obtain 64.44% F1 score of QED and 64.86% accuracy of EL, which ranks the 2nd place and indicates its practical use for QEDL problem. Copyright © 2018, The Authors. All rights reserved.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

Internet Traffic Analysis Using Community Detection and Apache Spark

Internet Traffic Analysis Using Community Detection and Apac...

引用

第九届网络分布式计算与知识发现国际会议( 2017 International Conference on Cyber-enabled distributed computing and knowledge discovery)

作者： Jiake Ni Weitao Weng Jiayu Chen Kai Lei Institute of Big Data Technologies Shenzhen Key Lab for Cloud Computing Technology & Applications School of Electronic and Computer Engineering(SECE)Peking University SHENZHEN 518055P.R.CHINA

With the rapid development of Internet,Internet traffic and end hosts continue to grow in *** behavior analysis for a large-scale network is becoming more and more *** address these challenges,this paper proposes an Internet traffic analysis approach based on community detection to discover community consisted of end hosts with similar traffic behavior in a large campus ***,we use only the IP-to-IP information without packet payloads to model the similarity of end hosts in campus *** the similarity graph which represent the social behavior similarity of all end hosts is ***,we leverage label Propagation algorithm to discover end hosts community on the similarity *** satisfy demands for the scalable analysis of evergrowing Internet traffic data,a Spark-based Internet traffic analysis system is developed,including implementing the above *** experimental results based on real campus network traffic show the benefits of the proposed approach in analyzing traffic behavior of a large-scale network on host community level and detecting potential anomalous traffic *** proposed approach reduces the complexity of analyzing the traffic behavior of a large network compare with analyzing individual *** addition,the experimental results also demonstrate the Spark-based Internet traffic analysis system can analyze Internet traffic efficiently.

关键词： Internet traffic analysis community detection Apache Spark data mining

来源：评论

学校读者我要写书评

暂无评论

An Entropy-based Probabilistic Forwarding Strategy in Named data Networking

An Entropy-based Probabilistic Forwarding Strategy in Named ...

引用

IEEE International Conference on Communications

作者： Kai Lei Jiawei Wang Jie Yuan Institute of Big Data Technologies Shenzhen Key Lab for Cloud Computing Technology & Applications School of Electronics and Computer Engineering (SECE) Peking University

ISBN: (纸本)9781467364300

The forwarding strategy is the key to the resiliency and efficiency of Named data Networking (NDN), which is a new and fundamental research area. For forwarding strategy, dynamically selecting an optimal interface from multiple alternative interfaces to forward an Interest packet is indeed a multiple attribute decision making (MADM) problem. In this paper, an entropy-based probabilistic forwarding (EPF) strategy is proposed to make a stochastic interface selection based on the combination of interfaces' dynamic availabilities and static routing information, which achieves better load balance in comparison with deterministic interface selection. By objectively assigning weights to attributes and considering multiple real-time network condition metrics, EPF can obtain the availabilities of interfaces more accurately and comprehensively. Since additional network metrics can be easily added and integrated into interfaces' assessment model, EPF provides good extensibility. In addition, we innovatively define two parameters (γ, δ) which can be used to trade off the effect factors between static routing information and dynamic running status of interfaces to customize EPF strategy for different network and application scenarios. Experiments show that EPF can realize preferable load balance and achieve higher throughput compared to the representative BestRoute forwarding strategy.

关键词： Entropy Throughput Routing Probabilistic logic Load modeling Packet loss

来源：评论

学校读者我要写书评

暂无评论

Profiling the Followers of the Most Influential and Verified Users on Sina Weibo

Profiling the Followers of the Most Influential and Verified...

引用

IEEE International Conference on Communications

作者： Huiyu Wang Kai Lei Kuai Xu Institute of Big Data Technologies Shenzhen Key Lab for Cloud Computing Technology & Applications School of Electronics and Computer Engineering (SECE) Peking University School of Mathematical and Natural Sciences Arizona State University

ISBN: (纸本)9781467364300

The new social media such as Twitter and Sina Weibo has become an increasingly popular channel for spreading influence, challenging traditional media such as TVs and news-papers. The most influential and verified users, also called big-V accounts on Sina Weibo often attract million of followers and fans, creating massive "celebrity-centric" social networks on the social media, which play a key role in disseminating breaking news, latest events, and controversial opinions on social issues. Given the importance of these accounts, it is very crucial to understand social networks and user influence of these accounts and profile their followers' behaviors. Towards this end, this paper monitors a selected group of influential users on Sina Weibo and collects their tweet streams as well as retweeting and commenting activities on these tweets from their followers. Our analysis on tweet data streams from Sina Weibo reveals when and what the followers comment on the tweets of these influential users, and discovers different temporal patterns and word diversity in the comments. Based on the insight gained from follower characteristics, we further develop simple and intuitive algorithms for classifying the followers into spammers and normal fans. Our experimental results demonstrate that the proposed algorithms are able to achieve an average accuracy of 95.20% in detecting spammers from the followers who have commented on the tweets of these influential accounts.

关键词： Classification algorithms Fans Social network services Media Feature extraction Standards Entropy

来源：评论

学校读者我要写书评

暂无评论

Extracting Unknown Words from Sina Weibo via data Clustering

Extracting Unknown Words from Sina Weibo via Data Clustering

引用

IEEE International Conference on Communications

作者： Kai Lei WeiYang Zhang Kai Zhang Kuai Xu Institute of Big Data Technologies Shenzhen Key Lab for Cloud Computing Technology & Applications School of Electronics and Computer Engineering (SECE) Peking University School of Mathematical and Natural Sciences Arizona State University

ISBN: (纸本)9781467364300

Sina Weibo, a Twitter-like microblogging site attracting over 240 million monthly active users to tweet, retweet, and comment, has rapidly become one of the most popular social media sites in China. As many users create new and innovative words on their tweets and comments, it is necessary to extract these emerging words, which do not exist in today's Chinese vocabulary or dictionary. Towards this end, this paper proposes a novel method based on data clustering of Weibo users and tweets for extracting unknown words from Weibo tweets and comments. Specifically, relying on the similarity of the users who post the tweets, we apply a hierarchical clustering to divide Weibo data into distinct groups, e.g., sports, news stories, movies, before extraction. Comparing with the method of unclustered Weibo data, our experimental results have successfully demonstrated the benefits of the proposed data clustering scheme for improving the recall and accuracy of extracting unknown Chinese words from tweets and comments.

关键词： Entropy Context data mining Clustering algorithms Social network services Clustering methods Couplings

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：