检索结果-内蒙古大学图书馆

MASCOT html and XML parser: An implementation of a novel object model for protein identification data

PROTEOMICS 2006年第21期6卷 5688-5693页

作者： Yang, Chunguang G. Granite, Stephen J. Van Eyk, Jennifer E. Winslow, Raimond L. Johns Hopkins Univ Ctr Cardiovasc Bioinformat & Modeling Inst Computat Med Baltimore MD 21218 USA Johns Hopkins Univ Whitaker Biomed Engn Inst Baltimore MD 21218 USA Johns Hopkins Univ Sch Med Div Cardiol Dept Med Baltimore MD USA

Protein identification using MS is an important technique in proteomics as wen as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with html or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads html and XML result files of MASCOT searches into a relational database. The source code is freely available at http://*** and the program uses only free and open-source Java libraries.

关键词： html parser java MASCOT parser protein identification data object model XML parser

来源：评论

学校读者我要写书评

暂无评论

A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information

引用

5th Hellenic Conference on Artificial Intelligence

作者： Tsekouras, George E. Gavalas, Damianos Filios, Stefanos Niros, Antonios D. Bafaloukas, George Univ Aegean Dept Cultural Technol & Commun Mitilini 81100 Lesvos Greece

ISBN: (纸本)9783540878803

We present a novel focused crawling method for extracting and processing cultural data from the web in a fully automated fashion. After downloading the pages, we extract from each document a number of words for each thematic cultural area. We then create multidimensional document vectors comprising the most frequent word occurrences. The dissimilarity between these vectors is measured by the Hamming distance. In the last stage, we employ cluster analysis to partition the document vectors into a number of clusters. Finally, our approach is illustrated via a proof-of-concept application which scrutinizes hundreds of web pages spanning different cultural thematic areas.

关键词： web crawling html parser document vector cluster analysis Hamming distance similarity measure filtering

来源：评论

学校读者我要写书评

暂无评论

基于lucene的图像搜索

基于lucene的图像搜索

引用

作者：黄均乐中南民族大学

学位级别：硕士

随着计算机的发展,信息量日益膨胀,在庞杂的信息中获取自己想要的信息变得日益复杂,特别是在搜索本机和网络图像的过程中。针对以上的难题,基于Lucene图像搜索系统使用优秀的搜索引擎Lucene作为二次开发平台,在此平台上进行二次开发。... 详细信息

随着计算机的发展,信息量日益膨胀,在庞杂的信息中获取自己想要的信息变得日益复杂,特别是在搜索本机和网络图像的过程中。针对以上的难题,基于Lucene图像搜索系统使用优秀的搜索引擎Lucene作为二次开发平台,在此平台上进行二次开发。本系统搜索图像前需要对图像建立索引,索引的对象为从图像中抽取的信息,根据图像的来源不同,系统把图像分为本机图像和网络图像,本机图像信息提取使用java平台二次开发提取,网络图像信息提取需要使用html parser二次开发提取,在获取信息后使用JE分词对提取的信息进行中文分词后索引。系统使用SWT/JFace二次开发实现UI显示界面。本文的主要工作包括:使用SWT/JFace平台二次开发UI界面;提取图像信息:本机图像名称、大小、宽度、高度和网络图像URL地址、名称、格式、上下文信息;使用JE分词对获取的图像信息进行分词;使用Lucene进行二次开发,对分词后的图像信息进行索引并对索引进行优化;确定图像搜索的范围,对Lucene进行二次开发搜索图像。本系统的特点:使用优秀的开源搜索引擎Lucene进行二次开发;使用html parser二次开发提取网络图像信息;实现对本机和网络图像搜索;搜索效率高;具有良好的定制性和扩展性,根据实际搜索情况定制图像搜索范围。本系统的实现是基于Java平台来实现,在实验的基础上对其性能进行测试和分析,在理论上和技术上是可行的,对于定制图像搜索应用研究具有一定的价值。

关键词： SWT Lunene 中文分词 html parser

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：