作者:
王珊杜小勇孟小峰陈红School of Information
Renmin University of China MOE Key Lab of Data Engineering and Knowledge Engineering Beijing 100872 P.R. China
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off la...
详细信息
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to pro...
详细信息
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree (MRST) which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others.
Dear editor,Frequent itemset mining (FIM) is important in many data mining applications [1], such as web log mining and trend analysis. However, if the data are sensitive (e.g., web browsing history), directly releasi...
详细信息
Dear editor,Frequent itemset mining (FIM) is important in many data mining applications [1], such as web log mining and trend analysis. However, if the data are sensitive (e.g., web browsing history), directly releasing frequent itemsets and their support may breach user privacy. The protection of user privacy while obtaining statistical information is im-
Advances in wireless sensor networks and positioning technologies enable new applications monitoring moving objects. Some of these applications, such as traffic management, require the possibility to query the future ...
详细信息
Advances in wireless sensor networks and positioning technologies enable new applications monitoring moving objects. Some of these applications, such as traffic management, require the possibility to query the future trajectories of the objects. In this paper, we propose an original data access method, the ANR-tree, which supports predictive queries. We focus on real life environments, where the objects move within constrained networks, such as vehicles on roads. We introduce a simulation-based prediction model based on graphs of cellular automata, which makes full use of the network constraints and the stochastic traffic behavior. Our technique differs strongly from the linear prediction model, which has low prediction accuracy and requires frequent updates when applied to real traffic with velocity changing frequently. The data structure extends the R-tree with adaptive units which group neighbor objects moving in the similar moving patterns. The predicted movement of the adaptive unit is not given by a single trajectory, but instead by two trajectory bounds based on different assumptions on the traffic conditions and obtained from the simulation. Our experiments, carried on two different datasets, show that the ANR-tree is essentially one order of magnitude more efficient than the TPR-tree, and is much more scalable.
In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many...
详细信息
In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.
Efficient management of RDF data is an important factor in realizing the Semantic Web vision. The existing approaches store RDF data based on triples instead of a relation model. In this paper, we propose a system cal...
详细信息
ISBN:
(纸本)9783642120251
Efficient management of RDF data is an important factor in realizing the Semantic Web vision. The existing approaches store RDF data based on triples instead of a relation model. In this paper, we propose a system called FlexTable, where all triples of an instance are coalesced into one tuple and all tuples are stored in relation schemas. The main technical challenge is how to partition all the triples into several tables, i.e. it is needed to design an effective and dynamic schema structure to store RDF triples. To deal with this challenge, we firstly propose a schema evolution method called LBA, which is based on a lattice structure to automatically evolve schemas while new triples are inserted. Secondly, we propose a novel page layout with an interpreted storage format to reduce the physical adjustment cost during schema evolution. Finally we perform comprehensive experiments on two practical RDF data sets to demonstrate that FlexTable is superior to the state-of-the-art approaches.
Beam tracking is crucial for maintaining stable data transmission in unmanned aerial vehicle (UAV) communications. However, a communication link can be disrupted by frequent switching of narrow beams between a base st...
详细信息
Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separa...
详细信息
Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.
The low-altitude economy (LAE), as a new economic paradigm, plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication. Specifically, unmanne...
详细信息
The low-altitude economy (LAE), as a new economic paradigm, plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication. Specifically, unmanned aerial vehicles (UAVs), as one of the core technologies of the LAE, can be deployed to provide communication coverage, facilitate data collection, and relay data for trapped users, thereby significantly enhancing the efficiency of post-disaster response efforts. However, conventional UAV self-organizing networks exhibit low reliability in long-range cases due to their limited onboard energy and transmit ability. Therefore, in this paper, we design an efficient and robust UAV-swarm enabled collaborative self-organizing network to facilitate post-disaster communications. Specifically, a ground device transmits data to UAV swarms, which then use collaborative beamforming (CB) technique to form virtual antenna arrays and relay the data to a remote access point (AP) efficiently. Then, we formulate a rescue-oriented post-disaster transmission rate maximization optimization problem (RPTRMOP), aimed at maximizing the transmission rate of the whole network. Given the challenges of solving the formulated RPTRMOP by using traditional algorithms, we propose a two-stage optimization approach to address it. In the first stage, the optimal traffic routing and the theoretical upper bound on the transmission rate of the network are derived. In the second stage, we transform the formulated RPTRMOP into a variant named V-RPTRMOP based on the obtained optimal traffic routing, aimed at rendering the actual transmission rate closely approaches its theoretical upper bound by optimizing the excitation current weight and the placement of each participating UAV via a diffusion model-enabled particle swarm optimization (DM-PSO) algorithm. Simulation results show the effectiveness of the proposed two-stage optimization approach in improving the transmission rate of the construct
Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels,among which only one is the ground-truth *** paper proposes a unified formulation th...
详细信息
Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels,among which only one is the ground-truth *** paper proposes a unified formulation that employs proper label constraints for training models while simultaneously performing *** existing partial label learning approaches that only leverage similarities in the feature space without utilizing label constraints,our pseudo-labeling process leverages similarities and differences in the feature space using the same candidate label constraints and then disambiguates noise *** experiments on artificial and real-world partial label datasets show that our approach significantly outperforms state-of-the-art counterparts on classification prediction.
暂无评论