检索结果-内蒙古大学图书馆

27th Annual CHI Conference on Human Factors in Computing Systems

作者： Toomim, Michael Drucker, Steven M. Dontcheva, Mira Rahimi, Ali Thomson, Blake Landay, James A. Univ Washington DUB Grp Seattle WA 98195 USA

ISBN: (纸本)9781605582467

We present reform, a step toward write-once apply-anywhere user interface enhancements. The reform system envisions roles for both programmers and end users in enhancing existing websites to support new goals. First, a programmer authors a traditional masbup or browser extension, but they do not write a web scraper. Instead they use reform, which allows novice end users to attach the enhancement to their favorite sites with a scraping by-example interface. reform makes enhancements easier to program while also carrying the benefit that end users can apply the enhancements to any number of new websites. We present reform's architecture, user interface, interactive by-example extraction algorithm for novices, and evaluation, along with five example reform enabled enhancements.

关键词： web data extraction mashups programming by example end-user programming

来源：评论

学校读者我要写书评

暂无评论

Online Price extraction and Decision Support for Agricultural Products

Online Price Extraction and Decision Support for Agricultura...

引用

2nd International Conference on Information Management, Innovation Management and Industrial Engineering

作者： Yu Chun-Yan Ma Jun Zhao Yu-Yan Chuzhou Univ Dept Comp Sci & Technol Chuzhou Peoples R China

ISBN: (纸本)9780769538761

It is a significant task to extract market data from different web pages for prediction and analysis. A prototype decision support system of an agricultural product market is designed and developed in this paper. It can extract online price information of a certain agricultural product from websites of agricultural wholesalse, predict the product price in the future months, and provide further decision support on such issues as which cities the product should be sent to for sale and which cities should be in the transport route. To achieve these goals, an algorithm named MDT-E (Market data Table Eextraction) is proposed to extract the maximum data table in a web page. Based on the common practice that "the price data are usually displayed in the largest table on a web page with the structure of "< td >" and "" tags", our market data extraction algorithm detects the largest table on a web page at first, then transforms the table into a DOM tree,and further obtains the node values of the "< td >" tags. This algorithm can automatically detect market data without an assigned data extraction region. The designed system uses a quadratic forcasting model of linear time series to predict the price, and compares the prediction results by using different time series and different sample data to find the best forecasting model to forecast the price in cites. In addition, it provides the decision support to determine the transport route based on the transport costs and product prices.

关键词： web data extraction prediction decision support

来源：评论

学校读者我要写书评

暂无评论

The personal publication reader: Illustrating web data extraction, personalization and reasoning for the Semantic web

引用

2nd European Semantic web Conference

作者： Baumgartner, R Henze, N Herzog, M Vienna Tech Univ Inst Informat Syst DBAI A-1040 Vienna Austria Univ Hannover ISI Semant Web Grp D-30167 Hannover Germany

ISBN: (纸本)3540261249

This paper shows how Semantic web technologies enable the design and implementation of advanced, personalized information systems. We demonstrate by means of an example application how personalized content syndication can be realized in the Semantic web. Our approach consists of two main parts: The web data extraction part, providing the information system with real-time, dynamic data, and the personalization part, which deduces - with the aid of ontological domain knowledge - personalized views on the data. The prototype of the system has been realized using the Personal Reader Framework for designing, implementing, and maintaining web content Readers(1).

关键词： semantic web personalization reasoning on the semantic web web data extraction

来源：评论

学校读者我要写书评

暂无评论

Extracting web data using instance-based learning

引用

WORLD WIDE web-INTERNET AND web INFORMATION SYSTEMS 2007年第2期10卷 113-132页

作者： Zhai, Yanhong Liu, Bing Univ Illinois Dept Comp Sci Chicago IL 60607 USA

This paper studies structured data extraction from web pages. Existing approaches to data extraction include wrapper induction and automated methods. In this paper, we propose an instance-based learning method, which performs extraction by comparing each new instance to be extracted with labeled instances. The key advantage of our method is that it does not require an initial set of labeled pages to learn extraction rules as in wrapper induction. Instead, the algorithm is able to start extraction from a single labeled instance. Only when a new instance cannot be extracted does it need labeling. This avoids unnecessary page labeling, which solves a major problem with inductive learning (or wrapper induction), i.e., the set of labeled instances may not be representative of all other instances. The instance-based approach is very natural because structured data on the web usually follow some fixed templates. Pages of the same template usually can be extracted based on a single page instance of the template. A novel technique is proposed to match a new instance with a manually labeled instance and in the process to extract the required data items from the new instance. The technique is also very efficient. Experimental results based on 1,200 pages from 24 diverse web sites demonstrate the effectiveness of the method. It also outperforms the state-of-the-art existing systems significantly.

关键词： web content mining web data extraction instance-based learning

来源：评论

学校读者我要写书评

暂无评论

Extracting web data using instance-based learning

引用

6th International Workshop on web Information Systems Engineering (WISE 2005)

作者： Zhai, Yanhong Liu, Bing Univ Illinois Dept Comp Sci Chicago IL 60607 USA

ISBN: (纸本)3540300171

关键词： web content mining web data extraction instance-based learning

来源：评论

学校读者我要写书评

暂无评论

Structured data extraction from the web based on partial tree alignment

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND data ENGINEERING 2006年第12期18卷 1614-1628页

作者： Zhai, Yanhong Liu, Bing Univ Illinois Dept Comp Sci Chicago IL 60607 USA

This paper studies the problem of structured data extraction from arbitrary web pages. The objective of the proposed research is to automatically segment data records in a page, extract data items/fields from these records, and store the extracted data in a database. Existing methods addressing the problem can be classified into three categories. Methods in the first category provide some languages to facilitate the construction of data extraction systems. Methods in the second category use machine learning techniques to learn wrappers (which are data extraction programs) from human labeled examples. Manual labeling is time-consuming and is hard to scale to a large number of sites on the web. Methods in the third category are based on the idea of automatic pattern discovery. However, multiple pages that conform to a common schema are usually needed as the input. In this paper, we propose a novel and effective technique (called DEPTA) to perform the task of web data extraction automatically. The method consists of two steps: 1) identifying individual records in a page and 2) aligning and extracting data items from the identified records. For step 1, a method based on visual information and tree matching is used to segment data records. For step 2, a novel partial alignment technique is proposed. This method aligns only those data items in a pair of records that can be aligned with certainty, making no commitment on the rest of the items. Experimental results obtained using a large number of web pages from diverse domains show that the proposed two-step technique is highly effective.

关键词： web data extraction wrapper generation partial tree alignement web mining

来源：评论

学校读者我要写书评

暂无评论

Olera: Semisupervised web-data extraction with visual support

引用

IEEE INTELLIGENT SYSTEMS 2004年第6期19卷 56-64页

作者： Chang, CH Kuo, SC Natl Cent Univ Dept Comp Sci & Informat Engn Chungli Taiwan

OLEPA is a semisupervised information-extraction system that produces extraction rules from semistructured web documents without requiring detailed annotation of the training documents. It performs well for program-ge... 详细信息

关键词： Semistructured data web data extraction Multiple String Alignment Rule Generalization

来源：评论

学校读者我要写书评

暂无评论

Automatic extraction of Complex web data

Automatic Extraction of Complex Web Data

引用

10th Pacific Asia Conference on Information Systems

作者： Zhang, Ming Zhou, Ying Patrick, Jon Univ Sydney Sch Informat Technol Sydney NSW 2006 Australia

A new wrapper induction algorithm WTM for generating rules that describe the general web page layout template is presented. WTM is mainly designed for use in weblog crawling and indexing system. Most weblogs are maintained by content management systems and have similar layout structures in all pages. In addition, they provide RSS feeds to describe the latest entries. These entries appear in the weblog homepage in HTML format as well. WTM is built upon these two observations. It uses RSS feed data to automatic-ally label the corresponding HTML file (weblog homepage) and induces general template rules from the labeled page. The rules can then be used to extract data from other pages of similar layout template. WTM is tested on some selected weblogs and the results are satisfactory.

关键词： weblog RSS Feed Wrapper Induction web data extraction

来源：评论

学校读者我要写书评

暂无评论

A Framework of web data Integrated LBS Middleware

引用

Wuhan University Journal of Natural Sciences 2006年第5期11卷 1187-1191页

作者： MENG Xiaofeng YIN Shaoyi XIAO Zhen School of Information Renmin University of ChinaBeijing 100872 China

In this paper, we propose a flexible locationbased service （LBS） middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management.

关键词： location-based service （LBS） middleware web data extraction

来源：评论

学校读者我要写书评

暂无评论

STAVIES: A system for information extraction from unknown web data sources through automatic web wrapper generation using clustering techniques

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND data ENGINEERING 2005年第12期17卷 1638-1652页

作者： Papadakis, NK Skoutas, D Raftopoulos, K Varvarigou, TA Natl Tech Univ Athens Dept Elect & Comp Engn GR-15773 Athens Greece

A fully automated wrapper for information extraction from web pages is presented. The motivation behind such systems lies in the emerging need for going beyond the concept of "human browsing." The World Wide web is today the main "all kind of information" repository and has been so far very successful in disseminating information to humans. By automating the process of information retrieval, further utilization by targeted applications is enabled. The key idea in our novel system is to exploit the format of the web pages to discover the underlying structure in order to finally infer and extract pieces of information from the web page. Our system first identifies the section of the web page that contains the information to be extracted and then extracts it by using clustering techniques and other tools of statistical origin. STAVIES can operate without human intervention and does not require any training. The main innovation and contribution of the proposed system consists of introducing a signal-wise treatment of the tag structural hierarchy and using hierarchical clustering techniques to segment the web pages. The importance of such a treatment is significant since it permits abstracting away from the raw tag-manipulating approach. Experimental results and comparisons with other state of the art systems are presented and discussed in the paper, indicating the high performance of the proposed algorithm.

关键词： automatic wrappers generic wrappers data source wrappers web mining web data extraction web structure mining intelligent agents on the web resource discovery information retrieval

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：