Video has become popular in our daily life for both professional and consumer applications. Both low level video processing and high level semantic video analysis are critically computational tasks in application doma...
详细信息
Video has become popular in our daily life for both professional and consumer applications. Both low level video processing and high level semantic video analysis are critically computational tasks in application domains. Most of current video computing tools are developed for specific analytic tasks, they are lack higher level interoperability with database and treat database merely as a relational data storage engine rather than an analytic platform, which causes inefficient data access and massive amount of data movement. In this paper, we study how to support video data management over relational database, and present our initial solutions of video data storage mechanism, video data access method and efficient video analytics. We also illustrate our ongoing prototype system HybVideo that developed in a novel architecture. It integrates above solutions to tackle the major challenges of providing a platform for both storage and analysis of video data.
作者:
Qing ZhuDepartment of Computer Science
Information School Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China Beijing China
Internet has become an excellent ecommerce platform for bringing together large numbers of buyers and sellers across wide geographic regions. Trust and reputation systems represent a. significant trend in decision sup...
详细信息
ISBN:
(纸本)9781424465972;9780769540115
Internet has become an excellent ecommerce platform for bringing together large numbers of buyers and sellers across wide geographic regions. Trust and reputation systems represent a. significant trend in decision support for Internet mediated service provision. However, most existing work assumes that all users have the same trust metrics, but in real life different users often have different preference of product attributes. This paper proposes trusted query navigation model by analyzing online customer reviews of trustworthiness of websites. The first step analyzes counting history feedback in the system offline and generates dynamic trust model by opinion classification. The second step presents trust evaluation ranking to help the user can easily discovery trust service of matching his needs. The experimental evaluation shows that the trusted query evaluation ranking has high trading efficiency, quick learning ability and satisfactory performance.
This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm i...
详细信息
This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm is applied to calculate law entity's similarity through referential relation and law clustering is carried out based on the SimRank similarity. This is the first time to apply SimRank algorithm in the domain of Law and use it to carry out text clustering. Prototype and experiments show that our solution is feasible. We also publish the extracted data as Linked Law data with RDF data model, which forms the first open semantic web database in Law domain. Linked Law data enables user to access law data with rich data links and query web data by application interface of Semantic Web.
The requirements of OLAP applications increase rapidly by dramatically increased data volume, users, query volume and query complexity. The requirement for shortening update period in data warehouse is another crucial...
详细信息
The requirements of OLAP applications increase rapidly by dramatically increased data volume, users, query volume and query complexity. The requirement for shortening update period in data warehouse is another crucial factor for a scalable OLAP application. In this paper, we propose a scalable OLAP prototype to support the query processing with increasing data volume by distributing the whole fact tuples to multiple servers to construct a set of sibling cubes which can be merged together to obtain the whole cube. We employ a light weight distribution policy with fully duplicated dimension tables in each sibling server on the observation of very low proportion of space cost for dimension tables. OLAP query with distributed aggregate functions can be transformed into queries to be performed parallel in sibling servers. For non-distributed computing aggregate functions, such as median, the optimized median aggregate computing algorithm is proposed to reduce transmission volume between servers while computing the global median values. We also present a three-level framework in data warehouse to meet the requirement of shorter update period in "operational business intelligence". An asynchronous tunnel model is proposed to reduce update latency by pre-fetching updated tuples to OLAP processing server. Finally, we set up prototype system ParaCube to evaluate performance in SN (shared-nothing) system and multi-core platforms.
In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for...
详细信息
In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for short). The algorithm predicts an object's next reference distance according to the retrieval system's running status and replaces the objects that have maximum reuse distances. The factors considered in the replacement algorithm include the access frequency, creation cost, and size of objects, as well as the queries being executed. By taking into account the queries currently running or queuing in the system, MRD algorithm can predict more accurately the reuse distances of index data objects.
Traffic congestion is a very serious problem in large cities. With the number of vehicles increasing rapidly, especially in cities whose economy is booming, the situation is getting even worse. In this paper, by lever...
详细信息
ISBN:
(纸本)9781424458509
Traffic congestion is a very serious problem in large cities. With the number of vehicles increasing rapidly, especially in cities whose economy is booming, the situation is getting even worse. In this paper, by leveraging the techniques of Vehicular Ad hoc Networks (VANETs) we present a real-time abnormal traffic data dissemination protocol. Specifically, all vehicles running on the same road segment are regarded as a cluster to generate traffic message about this segment. To reduce communication cost, only abnormal traffic data is issued and spread to nearby road segments. By employing event-driven and period combined mechanism, the abnormal traffic messages are disseminated to the vehicles that probably need the messages in time. We propose a distance dependent forwarder selection method to disseminate traffic message. When transmitted inside a cluster, messages are forwarded along the segment from one end to the other based on the least hops principle;while transmitted among clusters, messages are transmitted in epidemic routing mode, which ensure the fast and reliable dissemination. To evaluate the performance of our protocol, we use the real traffic data of Beijing at peak hour. The simulation results demonstrate that our protocol is feasible and efficient for metropolitan-size city.
XML Retrieval is becoming the focus study of the field of Information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, ...
详细信息
XML Retrieval is becoming the focus study of the field of Information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, as the basis of this study, the construction of the query-oriented XML text summarization corpus has not yet received enough attention. In this paper, we introduce our works on constructing this kind of corpus, including the selection of topics and XML elements/documents, construction process and the feature of the constructed corpus. Up to now, the corpus has 25 English query topics, including 422 elements for summarization, and 32 Chinese topics which including 402 elements. For each topic, 4 pieces of extracted summaries and 4 pieces of generated summaries are made manually by 4 experts.
In recent years, the spread of spam comments has become a main obstacle which limits the development of commercialized social networks. This paper analyzes the differences of behavioral characteristics between normal ...
详细信息
In recent years, the spread of spam comments has become a main obstacle which limits the development of commercialized social networks. This paper analyzes the differences of behavioral characteristics between normal users and malicious users. Based on these characteristics, we propose several heuristic methods to detect spam comments. These methods evaluate comments from three perspectives, including time-frequency characteristic of comments, text similarity of comments and the number of target domains each user's comments refer to. In our collected dataset, our experimental results indicate the accuracy of our detection strategy I (strategy for high accuracy) and strategy II (strategy for wide coverage) are 100% and 92.6%, respectively. The preliminary evaluation of the proposed detection methods shows promising result.
暂无评论