Recently there has been a lot of interest in graph-based analysis, with examples including social network analysis, recommendation systems, document classification and clustering, and so on. A graph is an abstraction ...
详细信息
Recently there has been a lot of interest in graph-based analysis, with examples including social network analysis, recommendation systems, document classification and clustering, and so on. A graph is an abstraction that naturally captures data objects as well as relationships among those objects. Objects are represented as nodes and relationships are represented as edges in the graph. There are many cases in which similarities among nodes are required to compute. SimRank is one of the simple and intuitive algorithms for this purpose. It is rigidly based on the random walk theorem. Existing methods on SimRank computation suffer from one limitation: the computing cost can be very high in practice. In order to optimize the computation of SimRank, a few techniques have been proposed. However, the performance of these methods are still limited by the processing ability of the single computer. Ideally, we would like to develop new parallel solutions that can offer improved processing power to compute SimRank on large data set. In this paper, we propose parallel algorithms for SimRank computation on Map-Reduce framework, and more specifically its open source implementation, Hadoop. Two different parallel methods are proposed and their performances are evaluated and compared. Furthermore, we employ the proposed methods to do the similarity computation in order to recommend appropriate products to users in social recommender systems.
Mobile phone data record people's calling logs in everyday life, which reflecting their custom, pattern and lifestyle. In this paper, we present approaches to urban activity analysis from real mobile phone locatio...
详细信息
With the rapid development of location sensing technology such as GPS, huge amount of location data through GPS are produced every day. The flood of taxi GPS data make it possible to predict the plentitude of traffic ...
详细信息
The reconfigurable manufacturing system is a cost-effective system that can accommodate a variety of equipments required by customers. However, because of the surprisingly increasing volume and semantically fuzzy natu...
详细信息
Very recently, the study of social networks has received a huge attention since we can learn and understand many hidden properties of our society. This paper investigates the potential of social network analysis to se...
详细信息
Sensor fusion is the combining of sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually. The natural uncertain...
详细信息
Greenhouse gases remote sensing monitoring system is implementation of greenhouse gases remote sensing applied technologies. This paper discusses the business application mode, operation scheme and application technol...
详细信息
Machinery and equipment descriptions play an important role in the design and manufacturing of industrial machinery devices and help reducing the design time and manufacturing costs of machinery devices. However, one ...
详细信息
Recent years have witnessed the explosive growth of online social networks (OSNs), which provide a perfect platform for observing the information propagation. Based on the theory of complex network analysis, consideri...
详细信息
Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separa...
详细信息
Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.
暂无评论