检索结果-内蒙古大学图书馆

6th International Workshop on web Information Systems Engineering (WISE 2005)

作者： He, Hai Meng, Weiyi Lu, Yiyao Yu, Clement Wu, Zonghuan SUNY Binghamton Dept Comp Sci Binghamton NY 13902 USA Univ Illinois Dept Comp Sci Chicago IL 60607 USA Univ SW Louisiana Ctr Adv Comp Studies Lafayette LA 70504 USA

Many databases have become web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta-information;however, the schema is not formally defined in HTML. Many web applications, such as web database integration and deep web crawling, require the construction of the schemas. In this paper, we first propose a schema model for representing complex search interfaces, and then present a layout-expression based approach to automatically extract the logical attributes from search interfaces. We also rephrase the identification of different types of semantic information as a classification problem, and design several Bayesian classifiers to help derive semantic information from extracted attributes. A system, WISE-iExtractor, has been implemented to automatically construct the schema from any web search interfaces. Our experimental results on real search interfaces indicate that this system is highly effective.

关键词： web databases search interfaces extraction interface schema

来源：评论

学校读者我要写书评

暂无评论

Evaluating top-k queries over web-accessible databases

引用

ACM TRANSACTIONS ON DATABASE SYSTEMS 2004年第2期29卷 319-362页

作者： Marian, A Bruno, N Gravano, L Columbia Univ Dept Comp Sci 1214 Amsterdam Ave New York NY 10027 USA Microsoft Res Redmond WA USA

A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using b

关键词： algorithms measurement performance parallel query processing query optimization top-k query processing web databases

来源：评论

学校读者我要写书评

暂无评论

Remediating Whitman (Responses to Ed Folsom's 'Database as Genre, The Epic Transformation of Archives')

引用

PMLA-PUBLICATIONS OF THE MODERN LANGUAGE ASSOCIATION OF AMERICA 2007年第5期122卷 1592-1596页

作者： McGill, Meredith L. Rutgers State Univ New Brunswick NJ 08903 USA

The author presents a critical response to Ed Folsom's article "Database a Genre: The Epic Transformation of Archives," which discussed the "Walt Whitman Archive," an online archive of the poet Walt Whitman's work. The author claims that digital databases are actually dependent on print conventions, such as the book. The author discusses the connection between Whitman's writing and the archive and other digital databases.

关键词： ELECTRONIC records web databases DIGITAL technology

来源：评论

学校读者我要写书评

暂无评论

POTENTIAL SUITABILITY ASSESSMENT OF INVASIVE SPECIES TO SCENIC FOREST BASED ON web DATASESES MINING AND GEOMATICS

POTENTIAL SUITABILITY ASSESSMENT OF INVASIVE SPECIES TO SCEN...

引用

第三世界生态高峰会

作者： Li Mingyang~(1,2) Jian Lirong~3 Liu Li~2 1 The Key Lahorotory For Silviculture and Conservation of Ministry of Education.Beijing Forestry University.100083,Beijing,China. 2 College of Forest Resrources and Environment Nanjing Forestry University.Nanjing 210037.Jiansu.China 3 College of Ecoonomics and Management,Nanjiang University of Aeronautics & Astronautics,Nanjing.210016.Jiangsu,China

Urban scenic forest is open to outside,fragmented and fragilc,thus is apt to the high risk of invasion of alien species The important measure to minimize the damage of invasive species is to prevent the potential invasive species from entering into suitable *** Mountain in Nanjing City is selected as a case study area in the paper. Research materials concerned with biological invasion are referred and three web databases in China and abroad are used to choose the potential invasive alien species in study ***,nine invasive species which threat the safety of forest ecosystem are picked out from the web *** three invasive alien species of Bursaphelenchus xylophilus,Matsucoccus matsumura and Hyphantria cunea are selected from 9 species by means of agricultural climate similarity ***,DEM and high resolution satellite image QuickBird of study area are collected to research the spatial distribution of the potential invasive species based on the desktop GIS platform ArcGis *** biological and geographical factors affecting spatial distribution of alien species are decided respectively and digitalized as different map ***,layers are overlaid and spatial suitability map is made to specify the location of potential invasive *** methods used in the paper overcome the weaknesses of traditional suitability research which can only analyze the suitability of single species and can not specify the location of potential invasive *** supplying the theoretical basis on decision-making to control the invasive species,the methods mentioned above are of great practical significance to environmental protection in regions of high historic and cultural values.

关键词： invasive alien species web databases Geomatics suitability assessment

来源：评论

学校读者我要写书评

暂无评论

An ontology for semantic integration of life science web databases

引用

INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS 2003年第2期12卷 275-294页

作者： Ben Miled, Z webster, YW Liu, Y Li, NH Purdue Sch Engn & Technol Dept Elect & Comp Engn Indianapolis IN 46202 USA Purdue Sch Sci Indianapolis IN 46202 USA

The incompatibilities among complex data formats and various schema used by biological databases that house these data are becoming a bottleneck in biological research. For example, biological data format varies from simple words (e.g. gene name), numbers (e.g. molecular weight) to sequence strings (e.g. nucleic acid sequence), to even more complex data formats such as taxonomy trees. Some information is embedded in narrative text, such as expert comments and publications. Some other information is expressed as graphs or images (e.g. pathways networks). The confederation of heterogeneous web databases has become a crucial issue in today's biological research. In other words, interoperability has to be archieved among the biological web databases and the heterogeneity of the web databases has to be resolved. This paper presents a biological ontology, BAO, and discusses its advantages in supporting the semantic integration of biological web databases are discussed.

关键词： ontology biology query translation integration web databases interoperability

来源：评论

学校读者我要写书评

暂无评论

QProber: A system for automatic classification of hidden-web databases

引用

ACM TRANSACTIONS ON INFORMATION SYSTEMS 2003年第1期21卷 1-41页

作者： Gravano, L Ipeirotis, PG Sahami, M Columbia Univ Dept Comp Sci New York NY 10027 USA Stanford Univ Dept Comp Sci Stanford CA 94305 USA

The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web "crawlers." Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.

关键词： algorithms experimentation performance database classification web databases hidden web

来源：评论

学校读者我要写书评

暂无评论

Creating Database-Backed Library web Pages: Using Open Source Tools 1