The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm ...
详细信息
ISBN:
(纸本)9783540724834
The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including BHITS proposed by Bharat and Henzinger cannot be used to find related pages any more on today's web, due to an increase of spam links. In this paper, we first propose three methods to find "linkfarms," that is, sets of spam links forming a densely connected subgraph of a web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarm by employing name servers, is most suitable for finding related pages on today's web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.
We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture differ...
详细信息
ISBN:
(纸本)9783540724834
We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B+-tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.
Modern large distributed applications, such as telecommunication and banking services, need to respond instantly to a huge number of queries within a short period of time. The data-intensive, query-intensive nature ma...
详细信息
Based on the end user's exact location, providing useful information and location based services (LBS) through wireless pervasive devices at right place and right time could be beneficial to both businesses and th...
详细信息
ISBN:
(纸本)9783540729082
Based on the end user's exact location, providing useful information and location based services (LBS) through wireless pervasive devices at right place and right time could be beneficial to both businesses and their customers. However, the adoption rates of these location-aware pervasive services from the consumption side are still low, implying that there might be some reasons keeping the potential users away from using LBS. This research attempted to find out such reasons by investigating what factors would negatively influence users' adoption of LBS. A hybrid approach, integrating a qualitative method, ZMET, with the quantitative data analysis of the samples collected from a subsequent questionnaire survey, was designed and implemented in this study to elicit and validate potential LBS users' in-depth feelings. Our study results show that cost. worry of security & privacy Issues, worry of quality of LBS information, and lack of cognition of LBS are the barriers impeding mobile service users' adoption of LBS applications. Our findings can be referenced by service providers for the purpose of the design and development of successful business applications to catch the revolutionary opportunity and benefit of LBS.
OLAP is widely used in data analysis. The existing design models, such as star schema and snowflake schema, are not flexible when the data model is changed. For example, the task for inserting a dimension may involve ...
详细信息
ISBN:
(纸本)9783642039959
OLAP is widely used in data analysis. The existing design models, such as star schema and snowflake schema, are not flexible when the data model is changed. For example, the task for inserting a dimension may involve complex operations over model and application implementation. To deal with this problem, a new cube model, called Meta Galaxy, is proposed. The main contributions of this work include: (1) analyzing the shortcoming of traditional design method, (2) proposing a new cube model which is flexible for dimension changes, and (3) designing an index structure and an algorithm to accelerate the cube query. The time complexity of query algorithm is linear. The extensive experiments on the real application and synthetic dataset show that Meta Galaxy is effective and efficient for cube query. Specifically, our method decreases the storage size by 95.12%, decreases the query time by 89.89% in average compared with SQL Server 2005, and has good scalability on data size.
Main memory database(MMDB) has much higher performance than disk resident database(DRDB), but the architecture of hardware limits the scalability of memory capacity. In OLAP applications, comparing with data volume, m...
详细信息
ISBN:
(纸本)9783642039959
Main memory database(MMDB) has much higher performance than disk resident database(DRDB), but the architecture of hardware limits the scalability of memory capacity. In OLAP applications, comparing with data volume, main memory capacity is not big enough and it is hard to extend. In this paper, ScaMMDB prototype is proposed towards the scalability of MMDB. A multi-node structure is established to enable system to adjust total main memory capacity dynamically when new nodes enter the system or some nodes leave the system. ScaMMDB is based on open source MonetDB which is a typical column storage model MMDB, column data transmission module, column data distribution module and query execution plan re-writing module are developed directly in MonetDB. Any node in ScaMMDB can response user's requirements and SQL statements are transformed automatically into extended column operating commands including local commands and remote call commands. Operation upon certain column is pushed into the node where column is stored, current node acts as temporarily mediator to call remote commands and assembles the results of each column operations. ScaMMDB is a test bed for scalability of MMDB, it can extend to MMDB cluster, MMDB replication server, even peer-to-peer OLAP server for further applications.
Complex event processing has been extensively applied in areas such as RFID tracking for supply chain management, fluctuation detection in stock trading, real-till-le intrusion detection in network monitoring, etc. Mo...
详细信息
ISBN:
(纸本)9783642142451
Complex event processing has been extensively applied in areas such as RFID tracking for supply chain management, fluctuation detection in stock trading, real-till-le intrusion detection in network monitoring, etc. Most existing research works focus on specification, formalization and evaluation of single-object oriented complex event processing. In this paper, we investigate complex event processing problems over multiple correlated RE ID objects. We study multiple correlated RFID event detection problems. We present two kinds of evaluation algorithms: SEquence Join Algorithm(SEJA) and Stream Join Algorithm(SJA). Experimental studies demonstrate, that our proposed algorithms are effective and scalable.
StreetTiVo is a project that aims at bringing research results into the living room;in particular, a mix of current results in the areas of Peer-to-Peer XML database management System (P2P XDBMS), advanced multimedia ...
详细信息
This book constitutes the refereed joint proceedings of eight international workshops held in conjunction with the 28th internationalconference on Conceptual Modeling, ER 2009, in Gramado, Brazil, in November 2009. T...
详细信息
ISBN:
(数字)9783642049477
ISBN:
(纸本)9783642049460
This book constitutes the refereed joint proceedings of eight international workshops held in conjunction with the 28th internationalconference on Conceptual Modeling, ER 2009, in Gramado, Brazil, in November 2009. The 33 revised full papers presented were carefully reviewed and selected from 100 submissions. Topics addressed by the workshops are active conceptual modeling of learning (ACM-L), conceptual modeling in the large (CoMoL), evolving theories of conceptual modeling (ETheCoM), workshop on foundations and practices of UML (FP-UML), jointinternational workshop on metamodels, ontologies, semantic technologies, and information systems for the semantic web (MOST-ONISW), quality of information systems (QoIS), requirements, Intentions and goals in conceptual modeling ( RIGiM) and semantic and conceptual issues in geographic information systems (SeCoGIS).
暂无评论