检索结果-内蒙古大学图书馆

12th International Conference on Web-Age Information Management

作者： Rong, Chuitian Lu, Wei Du, Xiaoyong Zhang, Xiao Key Labs. of Data Engineering and Knowledge Engineering MOE China School of Information Renmin University of China China Shanghai Key Laboratory of Intelligent Information Processing China

ISBN: (纸本)9783642235344;9783642235351

Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.

关键词： duplicate detection MapReduce Cloud

来源：评论

学校读者我要写书评

暂无评论

Tracer: Enforcing Mandatory Access Control in commodity OS with the support of light-weight intrusion detection and tracing 11

Tracer: Enforcing Mandatory Access Control in commodity OS w...

引用

6th International Symposium on Information, Computer and Communications Security, ASIACCS 2011

作者： Shan, Zhiyong Wang, Xin Chiueh, Tzi-Cker Key Laboratory of Data Engineering and Knowledge Engineering Renmin University MOE China Stony Brook University United States Industrial Technology Research Institute Taiwan

ISBN: (纸本)9781450305648

Enforcing a practical Mandatory Access Control (MAC) in a commercial operating system to tackle malware problem is a grand challenge but also a promising approach. The firmest barriers to apply MAC to defeat malware programs are the incompatible and unusable problems in existing MAC systems. To address these issues, we start our work by analyzing the technical details of 2,600 malware samples one by one and performing experiments over two types of MAC enforced operating systems. Based on the preliminary studies, we design a novel MAC model incorporating intrusion detection and tracing in a commercial operating system, named Tracer, in order to disable malware on hosts while offering good compatibility to existing software and good usability to common users who are not system experts. The model conceptually consists of three actions: detecting, tracing and restricting suspected intruders. One novelty is that it leverages light-weight intrusion detection and tracing techniques to automate security label configuration that is widely acknowledged as a tough issue when applying a MAC system in practice. The other is that, rather than restricting information flow as a traditional MAC does, it traces intruders and restricts only their critical malware behaviors, where intruders represent processes and executables that are potential agents of a remote attacker. Our prototyping and experiments on Windows show that Tracer can effectively defeat all malware samples tested via blocking malware behaviors while not causing a significant compatibility problem. Copyright 2011 ACM.

关键词： Malware

来源：评论

学校读者我要写书评

暂无评论

Detecting stealthy malware with inter-structure and imported signatures 11

Detecting stealthy malware with inter-structure and imported...

引用

6th International Symposium on Information, Computer and Communications Security, ASIACCS 2011

作者： Liang, Bin You, Wei Shi, Wenchang Liang, Zhaohui Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing 100872 China School of Information Renmin University of China Beijing 100872 China

ISBN: (纸本)9781450305648

Recent years have witnessed an increasing threat from kernel rootkits. A common feature of such attack is hiding malicious objects to conceal their presence, including processes, sockets, and kernel modules. Scanning memory with object signatures to detect the stealthy rootkit has been proven to be a powerful approach only when it is hard for adversaries to evade. However, it is difficult, if not impossible, to select fields from a single data structure as robust signatures with traditional techniques. In this paper, we propose the concepts of inter-structure signature and imported signature, and present techniques to detect stealthy malware based on these concepts. The key idea is to use cross-reference relationships of multiple data structures as signatures to detect stealthy malware, and to import some extra information into regions attached to target data structures as signatures. We have inferred four invariants as signatures to detect hidden processes, sockets, and kernel modules in Linux respectively and implemented a prototype detection system called DeepScanner. Meanwhile, we have also developed a hypervisor-based monitor to protect imported signatures. Our experimental result shows that our DeepScanner can effectively and efficiently detect stealthy objects hidden by seven real-world rootkits without any false positives and false negatives, and an adversary can hardly evade DeepScanner if he/she does not break the normal functions of target objects and the system. Copyright 2011 ACM.

关键词： Malware

来源：评论

学校读者我要写书评

暂无评论

Safe side effects commitment for OS-level virtualization 11

Safe side effects commitment for OS-level virtualization

引用

Proceedings of the 8th ACM international conference on Autonomic computing

作者： Shan, Zhiyong Wang, Xin Chiueh, Tzi-Cker Meng, Xiaofeng Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing China Stony Brook University Stony Brook NY United States Industrial Technology Research Institute Taiwan

ISBN: (纸本)9781450306072

A common application of virtual machines (VM) is to use and then throw away, basically treating a VM like a completely isolated and disposable entity. The disadvantage of this approach is that if there is no malicious activity, the user has to re-do all of the work in her actual workspace since there is no easy way to commit (i.e., merge) only the benign updates within the VM back to the host environment. In this work, we develop a VM commitment system called Secom to automatically eliminate malicious state changes when merging the contents of an OS-level VM to the host. Secom consists of three steps: grouping state changes into clusters, distinguishing between benign and malicious clusters, and committing benign clusters. Secom has three novel features. First, instead of relying on a huge volume of log data, it leverages OS-level information flow and malware behavior information to recognize malicious changes. As a result, the approach imposes a smaller performance overhead. Second, different from existing intrusion detection and recovery systems that detect compromised OS objects one by one, Secom classifies objects into clusters and then identifies malicious objects on a cluster by cluster basis. Third, to reduce the false positive rate when identifying malicious clusters, it simultaneously considers two malware behaviors that are of different types and the origin of the processes that exhibit these behaviors, rather than considers a single behavior alone as done by existing malware detection methods. We have successfully implemented Secom on the Feather-weight Virtual Machine (FVM) system, a Windows-based OS-level virtualization system. Experiments show that the prototype can effectively eliminate malicious state changes while committing a VM with small performance degradation. Moreover, compared with the commercial anti-malware tools, the Secom prototype has a smaller number of false negatives and thus can more thoroughly clean up malware side effects. In addit

关键词： Virtual machine

来源：评论

学校读者我要写书评

暂无评论

Effect factors on secondary structure of protein sequence pattern

Effect factors on secondary structure of protein sequence pa...

引用

International Workshop on Intelligent Systems and Applications

作者： Liu, Tao Li, Minghui Key Laboratory of Data Engineering and Knowledge Engineering MOE Beijing 100872 China School of Information Renmin University of China Beijing 100872 China School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China

ISBN: (纸本)9781424498574

Discovering the relationship between protein sequence pattern and protein secondary structure is important for accurately predicting secondary structure of protein sequence. A protein secondary structure pattern dictionary is constructed using protein sequence pattern and its corresponding secondary structure in this paper. Based on the constructed dictionary, we propose four effect factors on secondary structure of protein sequence pattern, including 1) the core pattern itself;2) patterns or amino acid residues that neighbor with the core pattern;3) patterns or amino acid residues that are far away from the core pattern;and 4) amino acid sequence segment that match the core pattern. Statistical measures are adopted to analyze these factors. The experimental result shows the reliability of these factors. The recognition of these effect factors presents new directions to predict protein secondary structure based on protein pattern dictionary. © 2011 IEEE.

关键词： Amino acids

来源：评论

学校读者我要写书评

暂无评论

Large scale report generation in data consolidation environments of banks

引用

Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition) 2012年第SUPPL.1期40卷 5-8页

作者： Qin, Xiongpai Zhou, Xiaoyun Wu, Zhongxin Yang, Hongzhi Wang, Wei Ministry of Education Key Lab of Data Engineering and Knowledge Engineering Renmin University of China Beijing 100872 China Information School Renmin University of China Beijing 100872 China Computer Science Department Jiangsu Normal University Xuzhou 221116 Jiangsu China Beijing Nantian Software Co. Ltd. Beijing 100085 China

To generate large number of reports in a limited time window, four techniques were proposed, including ROLAP&SQL, Shared Scanning, Hadoop based Solution, and MOLAP&Cube Sharding, an algorithm that performs in memory aggregation was designed for the second solution. The experiment results show that all techniques except ROLAP&SQL can meet the time window constraint, the Hadoop based solution is a promising technique owe to its highly scalability. Considering maturity of the techniques and their performance, we put MOLAP&Cube Sharding into practice while keeping an eye on Hadoop for future adoption.

关键词： Custom report data consolidation Hadoop Large bank MOLAP ROLAP

来源：评论

学校读者我要写书评

暂无评论

Probabilistic range queries for Uncertain Trajectories on road networks 11

Probabilistic range queries for Uncertain Trajectories on ro...

引用

14th International Conference on Extending database Technology: Advances in database Technology, EDBT 2011

作者： Zheng, Kai Trajcevski, Goce Zhou, Xiaofang Scheuermann, Peter School of ITEE University of Queensland Australia Department of EECS Northwestern University United States School of Information Renmin University of China Key Lab. of Data Engineering and Knowledge Engineering Ministry of Education China

ISBN: (纸本)9781450305280

Trajectories representing the motion of moving objects are typically obtained via location sampling, e.g. using GPS or road-side sensors, at discrete time-instants. In-between consecutive samples, nothing is known about the whereabouts of a given moving object. Various models have been proposed (e.g., sheared cylinders;spacetime prisms) to represent the uncertainty of the moving objects both in unconstrained Euclidian space, as well as road networks. In this paper, we focus on representing the uncertainty of the objects moving along road networks as time-dependent probability distribution functions, assuming availability of a maximal speed on each road segment. For these settings, we introduce a novel indexing mechanism - UTH (Uncertain Trajectories Hierarchy), based upon which efficient algorithms for processing spatio-temporal range queries are proposed. We also present experimental results that demonstrate the benefits of our proposed methodologies.

关键词： Trajectories

来源：评论

学校读者我要写书评

暂无评论

Sentiment classification via L2-norm deep belief network 11

Sentiment classification via L2-norm deep belief network

引用

20th ACM Conference on Information and knowledge Management, CIKM'11

作者： Liu, Tao Li, Minghui Zhou, Shusen Du, Xiaoyong Key Laboratory of Data Engineering and Knowledge Engineering MOE 100872 Beijing China School of Information Renmin University of China 100872 Beijing China Microsoft Asian Research and Development Group 100080 Beijing China Shenzhen Graduate School Harbin Institute of Technology 518055 Shenzhen China

ISBN: (纸本)9781450307178

Automatic analysis of sentiments expressed in large scale online reviews is very important for intelligent business applications. Sentiment classification is the most popular task of sentiment analysis, which is more challenging than traditional topic-based text classification. Basic features, such as vocabulary words, are not enough to classify sentiments well. Deep Belief Network (DBN) is introduced to discover more abstract features of sentiments. To capture full information of the features, large-size network can be constructed, but at the same time, large-size network tends to over fit the training data and even noise, which will reduce the generalization ability of the network. In this paper, L2-norm Deep Belief Network (L2DBN) is proposed, which uses L2-norm regularization to optimize the network parameters of DBN. L2DBN is first initialized by an unsupervised layer-wise training algorithm, and then fine-tuned by a supervised procedure. Network parameters are optimized using both classification loss and network complexity. Experimental results show that the proposed L2DBN outperforms the state-of-the-art method and the basic DBN on golden, noisy and heterogeneous datasets. © 2011 ACM.

关键词： Sentiment analysis

来源：评论

学校读者我要写书评

暂无评论

Lawyer information integration and recommendation by multi-source information validation

Lawyer information integration and recommendation by multi-s...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Tao Liu Biao Fan He Hu Xiao-Yong Du Key Laboratory of Data Engineering and Knowledge Engineering MOE Renmin University of China China School of Information Renmin University of China Beijing China

On the internet, all-round lawyer information is located at separated information sources, which prevent web users from effective information acquisition. In order to build a unified view of separated, heterogeneous, and often redundant lawyer information, we propose a new information integration method using multi-source information cross-validation. Based on the unified integrated data, a lawyer recommendation system is built. Several key technologies are presented and evaluated, including the multi-source information acquisition and validation. Experimental results indicate the key techniques used in the system are effective for lawyer information integration and recommendation.

关键词： Organizations Crawlers Search engines data mining Web pages databases Reliability

来源：评论

学校读者我要写书评

暂无评论

A property of reductions in Fuzzy Variable Precision Rough Set model

A property of reductions in Fuzzy Variable Precision Rough S...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Eric C. C. Tsang Su-Yun Zhao Cai-Li Zhou Faculty of Information Technology Macau University of Science and Technology Taipa Macao China Key Laboratory of Data Engineering and Knowledge Engineering MOE Renmin University of China China School of Mathematics and Computer Sciences Hebei University China

In this paper, we use strict mathematics reasoning to discover the relation between the threshold and reduction in Fuzzy Variable Precision Rough Sets (FVPRS), i.e., the reductions act as a nested structure with the monotonously increasing threshold. By using the nested structure of reductions, we could design algorithms to quickly find different reductions when a reduction is required. Here `different' means the reductions obtained using different thresholds.

关键词： Rough sets Approximation methods Fuzzy sets Educational institutions Computational modeling Machine learning Cybernetics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：