Open Source Forge (OSF) websites provide information on massive open source software projects, extracting these web data is important for open source research. Traditional extraction methods use string matching among ...
详细信息
Open Source Forge (OSF) websites provide information on massive open source software projects, extracting these web data is important for open source research. Traditional extraction methods use string matching among pages to detect page template, which is time-consuming. A recent work published in VLDB exploits redundant entities among websites to detect web page coordinates of these entities. The experiment gives good results when these coordinates are used for extracting other entities of the target site. However, OSF websites have few redundant project entities. This paper proposes a modified version of that redundancy-based method tailored for OSF websites, which relies on a similar yet weaker presumption that entity attributes are redundant rather than whole entities. Like the previous work, we also construct a seed database to detect web page coordinates of the redundancies, but all at the attribute-level. In addition, we apply attribute name verification to reduce false positives during extraction. The experiment result indicates that our approach is competent in extracting OSF websites, in which scenario the previous method can not be applied.
Context situation, which means a snapshot of the status of the real world, is formed by integrating a large amount of contexts collected from various resources. How to get the context situation and use the situation t...
详细信息
Context situation, which means a snapshot of the status of the real world, is formed by integrating a large amount of contexts collected from various resources. How to get the context situation and use the situation to provide better services is a challenging issue. In this paper, we focused on this challenge on the basis of the mobile cloud computing architecture. An abstract model is proposed in this paper to uniformly collect the context and send them to cloud. A rule-based large-scale context aggregation algorithm is also proposed which utilizes the MapReduce computing paradigm. Finally, a large-scale context management framework based on the abstract model and the context aggregation algorithm is proposed, and a real-time traffic demo is implemented to verify the validity of the framework.
This paper evaluates the performance and efficiency of Imagine stream processor for scientific programs. It classifies scientific programs into three classes based on their computation to memory access ratios. Typical...
详细信息
In recent years, many C code static analyzers, with different abilities of bug detection, have appeared and been applied in various domains. There are so many choices that it becomes hard for programmers to know in de...
详细信息
In recent years, many C code static analyzers, with different abilities of bug detection, have appeared and been applied in various domains. There are so many choices that it becomes hard for programmers to know in detail the strengths as well as limitations of all these analyzers and to find the most suitable ones for their code. In this paper, we propose a benchmark for C code static analyzers, named UCBench, to provide quantitative and qualitative measurements for evaluating analyzers. Being different from other benchmarks, UCBench concentrates more on users' requirements rather than the improvements of bug detecting technique itself. The major components of UCBench include test case database, evaluation metrics and harness. We classify test cases into several groups according to their attributes and design various user-centric evaluation metrics. Besides, we develop some harness to automate the evaluation process. Finally, we demonstrate our benchmark suite over four C code static analyzers: Flawfinder, Cppcheck, Uno and Splint.
In this paper, we apply tree-structured conditional random field (TCRF) to all-words word sense disambiguation (WSD), where the graphical structure of TCRF is the dependency syntax tree produced by Minipar. The extrem...
详细信息
Full precise pointer analysis has been a challenging problem, especially when dealing with dynamically-allocated memory. Separation logic can describe pointer alias formally, but cannot describe the quantitative reach...
详细信息
Full precise pointer analysis has been a challenging problem, especially when dealing with dynamically-allocated memory. Separation logic can describe pointer alias formally, but cannot describe the quantitative reachability between pointers. In this paper, we present a symbolic framework for analyzing the reachability between pointers in list-manipulating programs. The precise points-to relations of pointers in lists are described by formulae of quantitative separation logic (QSL), and the analysis framework is based on the operational and rearrangement rules about the assignments of pointers. The fixpoint calculus and the counter symbolic abstraction are used to find loop invariants. We can get precise relations between pointers at each point of list-manipulating programs. In the end, several initial examples about list-manipulating programs are given to show that the approach can get precise pointer analysis for list-manipulating programs.
Invalid pointer dereferences, such as null pointer dereferences, dangling pointer dereferences and double frees, are a prevalent source of software bugs in CPS software, due to flexible dereferencing pointers along va...
详细信息
Invalid pointer dereferences, such as null pointer dereferences, dangling pointer dereferences and double frees, are a prevalent source of software bugs in CPS software, due to flexible dereferencing pointers along various pointer fields. Existing tools have high overhead or are incomplete, thereby limiting their efficiency in checking the kind of CPS software with shared and mutable memory. In this paper, we present a novel extended pointer structure for detecting all invalid pointer dereferences in this kind of CPS software. We propose an invalid pointer dereferences detection algorithm based on the uniform transformation of abstract heap states. Experimental evaluation about a set of large C benchmark programs shows that the proposed approach is sufficiently efficient in detecting invalid pointer dereferences of CPS software with shared and mutable memory.
Many recent applications involve processing and analyzing uncertain data. Recently, several research efforts have addressed answering skyline queries efficiently on massive uncertain datasets. However, the research la...
详细信息
Many recent applications involve processing and analyzing uncertain data. Recently, several research efforts have addressed answering skyline queries efficiently on massive uncertain datasets. However, the research lacks methods to compute these queries on uncertain data, where each dimension of the uncertain object is represented as an interval or an exact value. In this paper, we extensively study the problem of skyline query on these interval based uncertain objects, which has never been studied before. We first model the problem of querying the skylines on interval datasets. Typically, we address two efficient algorithms with I/O optimal for the conventional interval skyline queries and constrained interval skyline queries, respectively. Extensive experiments demonstrate the efficiency of all our proposed algorithms.
Botnets are threatening the Internet heavily, and more and more botnets are utilizing the P2P technology to build their C&C (Command and Control) mechanisms. Some research have been made to compare the resilience ...
详细信息
Botnets are threatening the Internet heavily, and more and more botnets are utilizing the P2P technology to build their C&C (Command and Control) mechanisms. Some research have been made to compare the resilience of structured P2P botnets and unstructured ones, against elimination of nodes, but the problem that which eliminating strategy is the best is rarely studied. In this paper, we proposed a new metric called the half point, to measure the effectiveness of different strategies. We also selected seven different eliminating strategies and compared them. Through extensive simulations, we find that RBC is the best eliminating strategy. Further analysis shows that for the strategy RBC, the average degree of nodes in the botnet have the most significant influence. The bigger the average degree is, the bigger the half point of RBC is, which implies that node eliminating may not be a reasonable choice for mitigating botnets with big average degree. Results of this paper can provide guidance for restraining structured P2P botnets.
Performance prediction for the high performance computer system is of great importance for designing, implementing, and optimizing system. As a widely used technique for predicting performance, simulation method attra...
详细信息
Performance prediction for the high performance computer system is of great importance for designing, implementing, and optimizing system. As a widely used technique for predicting performance, simulation method attracts more and more attention from the research community. Based on analyzing the problems in the current performance simulation techniques, we present a key idea of the performance simulator for SMP system based on event-driven. We propose the framework of SMP-SIM and implement it based on MPICH2. The simulation results show that, our simulation technique has the advantages of high accuracy and simulation performance.
暂无评论