World Wide Web has become a major information resource for both individuals and institutions. the freedom of presenting data on the Web by HTML makes the information of same domain, such as sales of book, scientific p...
详细信息
ISBN:
(数字)9783540481072
ISBN:
(纸本)3540481052
World Wide Web has become a major information resource for both individuals and institutions. the freedom of presenting data on the Web by HTML makes the information of same domain, such as sales of book, scientific publications etc., be present on many Web sites with diverse format. thus to collect the data for a particular domain from the Web is not a trivial task, and how to solve the problem is becoming a trendy research area. this talk first gives an overview of this new area by categorizing the information on the Web, and indicating the difficulties in collecting domain specific data from the Web. As a solution, the talk then continues to present a stepwise methodology for collecting domain specific data from the Web, and introduce its supporting system SESQ which is a domain independent tool for building topic specific search engines for applications. the talk shows full features of SESQ by two application examples. In conclusion, the talk briefs further research directions in this new Web data processing area.
VERTIPH is a visual language designed to aid in the development of image processing algorithms on FPGAs (Field Programmable Gate Arrays). We justify the use of a visual language for this purpose, and describe the key ...
详细信息
ISBN:
(纸本)1595934731
VERTIPH is a visual language designed to aid in the development of image processing algorithms on FPGAs (Field Programmable Gate Arrays). We justify the use of a visual language for this purpose, and describe the key parts of VERTIPH. One aspect of importance is how to clearly and efficiently represent a pipeline of processors, and in particular distinguish a pipeline from the simpler serial or parallel structures. this paper develops a number of pipeline representations, discussing the rationale behind and limitations associated with each representation. the culmination of this development is the Sequential Pipeline with Detailed Bars, visually an efficient and unambiguous representation. Copyright 2006 ACM.
Among most of the approaches about XML query processing, structure join is more popular and efficient. the evaluation of structural relationship in the join, specifically the parent-child or ancestordescendant relatio...
详细信息
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and storage costs, but is rarely a serious pr...
详细信息
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and storage costs, but is rarely a serious problem. A more serious concern is that form letter customizations can include substantive issues that agencies are likely to overlook. the identification of exact- and near-duplicate texts, and recognition of unique text within near-duplicate documents, is an important component of data cleaning and integration processes for eRulemaking. this paper presents DURIAN (DUplicate Removal In lArge collectioN), a refinement of a prior near-duplicate detection algorithm DURIAN uses a traditional bag-of-words document representation, document attributes ("metadata"), and document content structure to identify form letters and their edited copies in public comment collections. Experimental results demonstrate that DURIAN is about as effective as human assessors. the paper concludes by discussing challenges to moving near-duplicate detection into operational rulemaking environments.
Withthe development of information systems in Internet, user authentication and authorization management gradually become one of the biggest concerns. To solve the problem, this paper presents a role-based PMI authen...
详细信息
the proceedings contain 23 papers. the special focus in this conference is on Performance I, Composition, Management I, Publish/subscribe technology, Databases, Mobile and ubiquitous computing, Security, Datamining te...
ISBN:
(纸本)354049023X
the proceedings contain 23 papers. the special focus in this conference is on Performance I, Composition, Management I, Publish/subscribe technology, Databases, Mobile and ubiquitous computing, Security, Datamining techniques, Performance II, and Management II. the topics include: Caching dynamic web content: designing and analysing an aspect-oriented solution;non-intrusive performance management for computer services;true and transparent distributed composition of aspect-components;policy-driven middleware for self-adaptation of web services compositions;living with nondeterminism in replicated middleware applications;trading off resources between overlapping overlays;efficient probabilistic subsumption checking for content-based publish/subscribe systems;dynamic load balancing in distributed content-based publish/subscribe;decentralized message ordering for publish/subscribe systems;dbfarm: a scalable cluster for multiple databases;queryll: java database queries through bytecode rewriting;contory: a middleware for the provisioning of context information on smart phones;efficient semantic service discovery in pervasive computing environments;a middleware system for protecting against application level denial of service attacks;generalized access control of synchronous communication;fmware: middleware for efficient filtering and matching of xml messages with local data;synergy: sharing-aware component composition for distributed stream processing systems;enforcing performance isolation across virtual machines in xen;low-overhead message tracking for distributed messaging;utility-driven proactive management of availability in enterprise-scale information flows;and model driven provisioning: bridging the gap between declarative object models and procedural provisioning tools.
this paper describes a designed and implemented system for efficient storage, indexing and search in collections of spoken documents that takes advantage of automatic speech recognition. As the quality of current spee...
详细信息
Withthe development of web services related technologies, more and more enterprises adopt web services to encapsulate their business systems to be published on Internet. Due to the different semantic of the web servi...
详细信息
Word frequencies play important roles in a variety of NLP-related applications. Word frequency estimation for Chinese is a big challenge due to characteristics of Chinese, in particular word-formation and word segment...
详细信息
Peer-to-peer computing has been emerged as very popular application due to the strong retrieval performance and the easiness of sharing resource and information. Nonetheless in reality p2p users are demanding more pri...
详细信息
暂无评论