Extracting information from tables is an important and rather complex part of information retrieval. For the task of objects extraction from HTML tables we introduce the following methods: determining table orientatio...
Extracting information from tables is an important and rather complex part of information retrieval. For the task of objects extraction from HTML tables we introduce the following methods: determining table orientation, processing of aggregating objects (like Total) and scattered headers (super row labels, subheaders).
This paper presents an experimental evaluation of the state-of-the-art approaches for automatic term recognition based on multiple features: machine learning method and voting algorithm. We show that in most cases mac...
详细信息
This paper presents an experimental evaluation of the state-of-the-art approaches for automatic term recognition based on multiple features: machine learning method and voting algorithm. We show that in most cases machine learning approach obtains the best results and needs little data for training;we also find the best subsets of all popular features.
There is a growing number of XML database systems of different kinds now on the market. XML DBMS vendors rushed to enrich their products with more flexible and advanced features to make them satisfy the requirements o...
详细信息
There is a growing number of XML database systems of different kinds now on the market. XML DBMS vendors rushed to enrich their products with more flexible and advanced features to make them satisfy the requirements of modern applications. And the time is ripe for the database research community to study the issues involved with extending XML DBMS with capabilities analogous to those that are popular in traditional DBMS, keeping in mind that XML databases now become a widespread means for storing and exchanging information on the Web, and increasingly used in dynamic applications such as e-commerce. In this paper, being bound for the issues, we provide a definition of triggers for XML based on XQuery and a previously defined update language, and methods to support triggers in XML database systems.
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable featur...
详细信息
ISBN:
(纸本)9781605584874
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected sub-graphs or communities, while non-important terms fall into weakly interconnected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two advantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise information in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Recently, Fuzzing is one of the most successful techniques to expose bugs in software. For testing large programs or large codebase with many features and entry-points, the creation of fuzz-targets remains a big chall...
详细信息
Efficient interactive rendering of large datasets still poses a problem. Widely used algorithm frustum culling is too conservative and leaves a lot of hidden objects in view. Occlusion culling with hardware occlusion ...
详细信息
Distribution is a well-known solution to increase performance and provide load balancing in case you need optimal resource utilization. Together with replication it also allows improved reliability, accessibility and ...
详细信息
Distribution is a well-known solution to increase performance and provide load balancing in case you need optimal resource utilization. Together with replication it also allows improved reliability, accessibility and fault-tolerance. However since the amount of data is large there is a problem of maintaining meta-information about distribution and finding needed data fragments during execution of queries. These problems are well understood but they have not received much attention in the context of XML data management. This paper presents research-in-progress, which examines the possibility of management of meta-information about XML data distribution extending auxillary index structure called DataGuide.
There are thousands of various software libraries being developed in the modern world - completely new libraries emerge as well as new versions of existing ones regularly appear. Unfortunately, developers of many libr...
详细信息
ISBN:
(纸本)9781457706066
There are thousands of various software libraries being developed in the modern world - completely new libraries emerge as well as new versions of existing ones regularly appear. Unfortunately, developers of many libraries focus on developing functionality of the library itself but neglect ensuring high quality and backward compatibility of application programming interfaces (APIs) provided by their libraries. The best practice to address these aspects is having an automated regression test suite that can be regularly (e.g., nightly) run against the current development version of the library. Such a test suite would ensure early detection of any regressions in the quality or compatibility of the library. But developing a good test suite can cost significant amount of efforts, which becomes an inhibiting factor for library developers when deciding QA policy. That is why many libraries do not have a test suite at all. This paper discusses an approach for low cost automatic generation of basic tests for shared libraries based on the information automatically extracted from the library header files and additional information about semantics of some library data types. Such tests can call APIs of target libraries with some correct parameters and can detect typical problems like crashes "out-of-the-box". Using this method significantly lowers the barrier for developing an initial version of library tests, which can be then gradually improved with a more powerful test development framework as resources appear. The method is based on analyzing API signatures and type definitions obtained from the library header files and creating parameter initialization sequences through comparison of target function parameter types with other functions' return values or out-parameters (usually, it is necessary to call some function to get a correct parameter value for another function and the initialization sequence of the necessary function calls can be quite long). The paper also descri
Reputation and competitiveness of both mobile applications and mobile operating systems depend on their quality. Developers are using various techniques to ensure high quality. Recently, exploratory testing approaches...
详细信息
暂无评论