This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm i...
详细信息
This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm is applied to calculate law entity's similarity through referential relation and law clustering is carried out based on the SimRank similarity. This is the first time to apply SimRank algorithm in the domain of Law and use it to carry out text clustering. Prototype and experiments show that our solution is feasible. We also publish the extracted data as Linked Law data with RDF data model, which forms the first open semantic web database in Law domain. Linked Law data enables user to access law data with rich data links and query web data by application interface of Semantic Web.
The requirements of OLAP applications increase rapidly by dramatically increased data volume, users, query volume and query complexity. The requirement for shortening update period in data warehouse is another crucial...
详细信息
The requirements of OLAP applications increase rapidly by dramatically increased data volume, users, query volume and query complexity. The requirement for shortening update period in data warehouse is another crucial factor for a scalable OLAP application. In this paper, we propose a scalable OLAP prototype to support the query processing with increasing data volume by distributing the whole fact tuples to multiple servers to construct a set of sibling cubes which can be merged together to obtain the whole cube. We employ a light weight distribution policy with fully duplicated dimension tables in each sibling server on the observation of very low proportion of space cost for dimension tables. OLAP query with distributed aggregate functions can be transformed into queries to be performed parallel in sibling servers. For non-distributed computing aggregate functions, such as median, the optimized median aggregate computing algorithm is proposed to reduce transmission volume between servers while computing the global median values. We also present a three-level framework in data warehouse to meet the requirement of shorter update period in "operational business intelligence". An asynchronous tunnel model is proposed to reduce update latency by pre-fetching updated tuples to OLAP processing server. Finally, we set up prototype system ParaCube to evaluate performance in SN (shared-nothing) system and multi-core platforms.
In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for...
详细信息
In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for short). The algorithm predicts an object's next reference distance according to the retrieval system's running status and replaces the objects that have maximum reuse distances. The factors considered in the replacement algorithm include the access frequency, creation cost, and size of objects, as well as the queries being executed. By taking into account the queries currently running or queuing in the system, MRD algorithm can predict more accurately the reuse distances of index data objects.
XML Retrieval is becoming the focus study of the field of Information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, ...
详细信息
XML Retrieval is becoming the focus study of the field of Information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, as the basis of this study, the construction of the query-oriented XML text summarization corpus has not yet received enough attention. In this paper, we introduce our works on constructing this kind of corpus, including the selection of topics and XML elements/documents, construction process and the feature of the constructed corpus. Up to now, the corpus has 25 English query topics, including 422 elements for summarization, and 32 Chinese topics which including 402 elements. For each topic, 4 pieces of extracted summaries and 4 pieces of generated summaries are made manually by 4 experts.
A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compar...
详细信息
Developing an integration management system for business continuity, records and knowledge (IMS of BRK) is beneficial to the collaboration, optimization and innovation of business continuity management system (BCMS), ...
详细信息
Developing an integration management system for business continuity, records and knowledge (IMS of BRK) is beneficial to the collaboration, optimization and innovation of business continuity management system (BCMS), records management system (RMS) and knowledge management system (KMS) for organization. Comprehensive research and development requirements along with a cogent framework, however, have not been proposed for integrating three of them, which were proposed independently. Based on situational analysis of cross boundaries integration feasibilities in terms of common understandings, general principles and best practice frameworks from relevant national and international standards, this paper proposes integration thinking to combine the advantages of the three paradigms to sustainable competitive advantages; supported by international best practices, the authors propose an integration route covering five levels of integration framework, two integration approaches and three integration controls for dynamic accumulation, sharing and exchange of evidence, memory and knowledge in digital world and global competition.
Much research has been done on integrated use of ISO management system standards. Integrated use of management systems is identified to have shared values of varied integration impacts on resources efficiency building...
详细信息
Much research has been done on integrated use of ISO management system standards. Integrated use of management systems is identified to have shared values of varied integration impacts on resources efficiency building and sustainable development of business processes. However, little research has been done on integrated use of business continuity management systems (BCMS), records management systems (RMS) and knowledge management systems (KMS). This paper proposes a holistic integration management approach for collaboration, optimization and innovation of the three management systems through mapping/building/operationalizing cycle for supply of efficiency building strategy to the dynamic accumulation, sharing, and exchanges of memory, evidence and knowledge of organization.
暂无评论