Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
Enforcing a practical Mandatory Access Control (MAC) in a commercial operating system to tackle malware problem is a grand challenge but also a promising approach. The firmest barriers to apply MAC to defeat malware p...
详细信息
ISBN:
(纸本)9781450305648
Enforcing a practical Mandatory Access Control (MAC) in a commercial operating system to tackle malware problem is a grand challenge but also a promising approach. The firmest barriers to apply MAC to defeat malware programs are the incompatible and unusable problems in existing MAC systems. To address these issues, we start our work by analyzing the technical details of 2,600 malware samples one by one and performing experiments over two types of MAC enforced operating systems. Based on the preliminary studies, we design a novel MAC model incorporating intrusion detection and tracing in a commercial operating system, named Tracer, in order to disable malware on hosts while offering good compatibility to existing software and good usability to common users who are not system experts. The model conceptually consists of three actions: detecting, tracing and restricting suspected intruders. One novelty is that it leverages light-weight intrusion detection and tracing techniques to automate security label configuration that is widely acknowledged as a tough issue when applying a MAC system in practice. The other is that, rather than restricting information flow as a traditional MAC does, it traces intruders and restricts only their critical malware behaviors, where intruders represent processes and executables that are potential agents of a remote attacker. Our prototyping and experiments on Windows show that Tracer can effectively defeat all malware samples tested via blocking malware behaviors while not causing a significant compatibility problem. Copyright 2011 ACM.
Recent years have witnessed an increasing threat from kernel rootkits. A common feature of such attack is hiding malicious objects to conceal their presence, including processes, sockets, and kernel modules. Scanning ...
详细信息
ISBN:
(纸本)9781450305648
Recent years have witnessed an increasing threat from kernel rootkits. A common feature of such attack is hiding malicious objects to conceal their presence, including processes, sockets, and kernel modules. Scanning memory with object signatures to detect the stealthy rootkit has been proven to be a powerful approach only when it is hard for adversaries to evade. However, it is difficult, if not impossible, to select fields from a single data structure as robust signatures with traditional techniques. In this paper, we propose the concepts of inter-structure signature and imported signature, and present techniques to detect stealthy malware based on these concepts. The key idea is to use cross-reference relationships of multiple data structures as signatures to detect stealthy malware, and to import some extra information into regions attached to target data structures as signatures. We have inferred four invariants as signatures to detect hidden processes, sockets, and kernel modules in Linux respectively and implemented a prototype detection system called DeepScanner. Meanwhile, we have also developed a hypervisor-based monitor to protect imported signatures. Our experimental result shows that our DeepScanner can effectively and efficiently detect stealthy objects hidden by seven real-world rootkits without any false positives and false negatives, and an adversary can hardly evade DeepScanner if he/she does not break the normal functions of target objects and the system. Copyright 2011 ACM.
A common application of virtual machines (VM) is to use and then throw away, basically treating a VM like a completely isolated and disposable entity. The disadvantage of this approach is that if there is no malicious...
详细信息
ISBN:
(纸本)9781450306072
A common application of virtual machines (VM) is to use and then throw away, basically treating a VM like a completely isolated and disposable entity. The disadvantage of this approach is that if there is no malicious activity, the user has to re-do all of the work in her actual workspace since there is no easy way to commit (i.e., merge) only the benign updates within the VM back to the host environment. In this work, we develop a VM commitment system called Secom to automatically eliminate malicious state changes when merging the contents of an OS-level VM to the host. Secom consists of three steps: grouping state changes into clusters, distinguishing between benign and malicious clusters, and committing benign clusters. Secom has three novel features. First, instead of relying on a huge volume of log data, it leverages OS-level information flow and malware behavior information to recognize malicious changes. As a result, the approach imposes a smaller performance overhead. Second, different from existing intrusion detection and recovery systems that detect compromised OS objects one by one, Secom classifies objects into clusters and then identifies malicious objects on a cluster by cluster basis. Third, to reduce the false positive rate when identifying malicious clusters, it simultaneously considers two malware behaviors that are of different types and the origin of the processes that exhibit these behaviors, rather than considers a single behavior alone as done by existing malware detection methods. We have successfully implemented Secom on the Feather-weight Virtual Machine (FVM) system, a Windows-based OS-level virtualization system. Experiments show that the prototype can effectively eliminate malicious state changes while committing a VM with small performance degradation. Moreover, compared with the commercial anti-malware tools, the Secom prototype has a smaller number of false negatives and thus can more thoroughly clean up malware side effects. In addit
Discovering the relationship between protein sequence pattern and protein secondary structure is important for accurately predicting secondary structure of protein sequence. A protein secondary structure pattern dicti...
详细信息
To generate large number of reports in a limited time window, four techniques were proposed, including ROLAP&SQL, Shared Scanning, Hadoop based Solution, and MOLAP&Cube Sharding, an algorithm that performs in ...
详细信息
To generate large number of reports in a limited time window, four techniques were proposed, including ROLAP&SQL, Shared Scanning, Hadoop based Solution, and MOLAP&Cube Sharding, an algorithm that performs in memory aggregation was designed for the second solution. The experiment results show that all techniques except ROLAP&SQL can meet the time window constraint, the Hadoop based solution is a promising technique owe to its highly scalability. Considering maturity of the techniques and their performance, we put MOLAP&Cube Sharding into practice while keeping an eye on Hadoop for future adoption.
Trajectories representing the motion of moving objects are typically obtained via location sampling, e.g. using GPS or road-side sensors, at discrete time-instants. In-between consecutive samples, nothing is known abo...
详细信息
ISBN:
(纸本)9781450305280
Trajectories representing the motion of moving objects are typically obtained via location sampling, e.g. using GPS or road-side sensors, at discrete time-instants. In-between consecutive samples, nothing is known about the whereabouts of a given moving object. Various models have been proposed (e.g., sheared cylinders;spacetime prisms) to represent the uncertainty of the moving objects both in unconstrained Euclidian space, as well as road networks. In this paper, we focus on representing the uncertainty of the objects moving along road networks as time-dependent probability distribution functions, assuming availability of a maximal speed on each road segment. For these settings, we introduce a novel indexing mechanism - UTH (Uncertain Trajectories Hierarchy), based upon which efficient algorithms for processing spatio-temporal range queries are proposed. We also present experimental results that demonstrate the benefits of our proposed methodologies.
Automatic analysis of sentiments expressed in large scale online reviews is very important for intelligent business applications. Sentiment classification is the most popular task of sentiment analysis, which is more ...
详细信息
On the internet, all-round lawyer information is located at separated information sources, which prevent web users from effective information acquisition. In order to build a unified view of separated, heterogeneous, ...
详细信息
On the internet, all-round lawyer information is located at separated information sources, which prevent web users from effective information acquisition. In order to build a unified view of separated, heterogeneous, and often redundant lawyer information, we propose a new information integration method using multi-source information cross-validation. Based on the unified integrated data, a lawyer recommendation system is built. Several key technologies are presented and evaluated, including the multi-source information acquisition and validation. Experimental results indicate the key techniques used in the system are effective for lawyer information integration and recommendation.
In this paper, we use strict mathematics reasoning to discover the relation between the threshold and reduction in Fuzzy Variable Precision Rough Sets (FVPRS), i.e., the reductions act as a nested structure with the m...
详细信息
In this paper, we use strict mathematics reasoning to discover the relation between the threshold and reduction in Fuzzy Variable Precision Rough Sets (FVPRS), i.e., the reductions act as a nested structure with the monotonously increasing threshold. By using the nested structure of reductions, we could design algorithms to quickly find different reductions when a reduction is required. Here `different' means the reductions obtained using different thresholds.
暂无评论