检索结果-内蒙古大学图书馆

VR-Tree: A novel tree-based approach for modeling web query interfaces

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS 2017年第3期49卷 367-390页

作者： Marin-Castro, Heidy M. Sosa Sosa, Victor J. Univ Politecn Victoria Ciudad Victoria Tamaulipas Mexico Cinvestav Unidad Tamaulipas Ciudad Victoria Tamaulipas Mexico

web query interfaces (WQIs) play a very important role in retrieving Deep web content. WQIs allow users to query domain-specific databases for obtaining information of interest from diverse domains such as car rentals, hotels, airfare, etc. As the number of WQIs on the web is increasing drastically, some research efforts are focused on building a single (unified) WQI that allows users to query and integrate information available in different web databases related to a specific domain. A very important task in this WQIs' integration process is the extraction, modeling and understanding of WQIs' semantic content. However, this task is challenging because of the great heterogeneity in the design of WQIs. This paper presents a novel tree-based approach for the modeling and understanding of WQIs. A tree schema called the Visual Reduced Tree (VR-Tree) is built from the tree produced by a web browser's render engine, applying a set of well- defined functions and guided by a set of heuristic rules to identify the WQI's main components and their relationships. The proposed strategy was evaluated by running a collection of experiments over the Tel-8 and ICQ datasets from the UIUC repository. The results show that the automatic modeling of WQIs is possible with a high degree of precision if compared against previous approaches, simplifying the modeling task by only considering visual and spatial properties of WQI components using the VR-Tree schema proposed in this work.

关键词： web query interfaces Modeling Schema tree Render tree Heuristic rules

来源：评论

学校读者我要写书评

暂无评论

Automatic discovery of web query interfaces using machine learning techniques

引用

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS 2013年第1期40卷 85-108页

作者： Marin-Castro, Heidy M. Sosa-Sosa, Victor J. Martinez-Trinidad, Jose F. Lopez-Arevalo, Ivan Natl Polytech Inst Ctr Res & Adv Studies Informat Technol Lab Victoria City Tamaulipas Mexico Natl Inst Astrophys Opt & Elect Tonantzintla Puebla San Andres Chol Mexico

The amount of information contained in databases available on the web has grown explosively in the last years. This information, known as the Deep web, is heterogeneous and dynamically generated by querying these back-end (relational) databases through web query interfaces (WQIs) that are a special type of HTML forms. The problem of accessing to the information of Deep web is a great challenge because the information existing usually is not indexed by general-purpose search engines. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in the Deep web. Since WQIs are the only means to access to the Deep web, the automatic identification of WQIs plays an important role. It facilitates traditional search engines to increase the coverage and the access to interesting information not available on the indexable web. The accurate identification of Deep web data sources are key issues in the information retrieval process. In this paper we propose a new strategy for automatic discovery of WQIs. This novel proposal makes an adequate selection of HTML elements extracted from HTML forms, which are used in a set of heuristic rules that help to identify WQIs. The proposed strategy uses machine learning algorithms for classification of searchable (WQIs) and non-searchable (non-WQI) HTML forms using a prototypes selection algorithm that allows to remove irrelevant or redundant data in the training set. The internal content of web query interfaces was analyzed with the objective of identifying only those HTML elements that are frequently appearing provide relevant information for the WQIs identification. For testing, we use three groups of datasets, two available at the UIUC repository and a new dataset that we created using a generic crawler supported by human experts that includes advanced and simple query interfaces. The experimental results show that the proposed strategy outperforms others previously reported works.

关键词： Deep web Hidden-web databases web query interfaces Dsupervised classification

来源：评论

学校读者我要写书评

暂无评论

Automatic construction of vertical search tools for the Deep web

引用

IEEE LATIN AMERICA TRANSACTIONS 2018年第2期16卷 574-584页

作者： Marin, H. M. Sosa, V. J. Nuno, M. A. Univ Politecn Victoria Victoria Tamaulipas Mexico CINVESTAV LTI Victoria Tamaulipas Mexico Univ Politecn Victoria Ingn Victoria Tamaulipas Mexico

With the constant increase in the volume of information available on the web, it is more dificult to find the specific information related to a given domain. Users are facing the problem of information overload, in which a query about a specialized subject (local information, e-commerce: hotels, airlines, car rental;science: biology, mathematics, medicine, etc.) on a web search engine, it returns a lot of web pages or results that in most of the cases are outside the domain of interest. This is one reason why the vertical search tools have become a necessity for users that seek specific-domain information from diferent databases available in the web through input sources called web query interfaces (ICWs). This paper describes an approach for automatic integration of ICWs, a crucial task to construct vertical search tools. The proposed methodology is validated by realizing a vertical search prototype called VSearch that allows users to transparently query multiple web databases in a specific-domain through a unified ICW. The proposed approach for automatic ICWs integration is based on: i) a hierarchical model called AEV for modeling the visual content of ICW;ii) semantic clustering for the identification of relationships between fields in ICWs;and iii) a field homogenization and unification process of AEV schemes for the construction of a unified ICW. The VSearch prototype was implemented and evaluated using a study case. The experimental results demonstrate the high precision in the integration phase and an efective methodology to create a functional vertical search tool for a given domain.

关键词： Vertical Search Tool web Databases web query interfaces Automatic Integration VSearch

来源：评论

学校读者我要写书评

暂无评论

Automatic Identification of web query interfaces

Automatic Identification of Web Query Interfaces

引用

10th Mexican International Conference on Artificial Intelligence (MICAI 2011)

作者： Marin-Castro, Heidy M. Sosa-Sosa, Victor J. Lopez-Arevalo, Ivan Center of Research and Advanced Studies National Polytechnic Institute Scientific and Technological Park of Tamaulipas TECNOTAM Mexico

ISBN: (纸本)9783642253294

The amount of information contained in databases in the web has grown explosively in the last years. This information, known as the Deep web, is dynamically obtained from specific queries to these databases through web query interfaces (WQIs). The problem of finding and accessing databases in the web is a great challenge due to the web sites are very dynamic and the information existing is heterogeneous. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in databases in the web. Since WQIs are the only means to access databases in the web;the automatic identification of WQIs plays an important role facilitating traditional search engines to increase the coverage and access interesting information not available on the indexable web. In this paper we present a strategy for automatic identification of WQIs using supervised learning and making an adequate selection and extraction of HTML elements in the WQIs to form the training set. We present two experimental tests over a corpora of HTML forms considering positive and negative examples. Our proposed strategy achieves better accuracy than previous works reported in the literature.

关键词： Deep web Databases web query interfaces classification information extraction

来源：评论

学校读者我要写书评

暂无评论

Analysis of navigation behaviour in web sites integrating multiple information systems

引用

VLDB JOURNAL 2000年第1期9卷 56-75页

作者： Berendt, B Spiliopoulou, M Humboldt Univ Fac Philosophy 4 Inst Petag & Informat D-10117 Berlin Germany Humboldt Univ Fac Econ Inst Informat Syst D-10178 Berlin Germany

The analysis of web usage has mostly focused on sites composed of conventional static pages. However, huge amounts of information available in the web come from databases or other data collections and are presented to the users in the form of dynamically generated pages. The query interfaces of such sites allow the specification of many search criteria. Their generated results support navigation to pages of results combining cross-linked data from many sources. For the analysis of visitor navigation behaviour in such web sites, we propose the web usage miner (WUM), which discovers navigation patterns subject to advanced statistical and structural constraints. Since our objective is the discovery of interesting navigation patterns, we do not focus on accesses to individual pages. Instead, Eve construct conceptual hierarchies that reflect the query capabilities used in the production of these pages. Our experiments with a real web site that integrates data from multiple databases, the German Schulweb, demonstrate the appropriateness of WUM in discovering navigation patterns and show how those discoveries can help in assessing and improving the quality of the site.

关键词： web usage mining data mining web query interfaces web databases query capabilities conceptual hierarchies

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：