As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-dri...
详细信息
As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.
This report summarizes the M3 Workshop held at the January 2010 Pacific Symposium on Biocomputing. The workshop, organized by Genomic Standards Consortium members, in-cluded five contributed talks, a series of short p...
详细信息
This report summarizes the M3 Workshop held at the January 2010 Pacific Symposium on Biocomputing. The workshop, organized by Genomic Standards Consortium members, in-cluded five contributed talks, a series of short presentations from stakeholders in the genom-ics standards community, a poster session, and, in the evening, an open discussion session to review current projects and examine future directions for the GSC and its stakeholders.
The characteristic framework types of zeolite crystals are routinely determined by calculating coordination sequences and vertex symbols of the 3D crystal structures. This method has limitations and tends to fail when...
详细信息
ISBN:
(纸本)9781601321091
The characteristic framework types of zeolite crystals are routinely determined by calculating coordination sequences and vertex symbols of the 3D crystal structures. This method has limitations and tends to fail when the synthesized crystals are not close to perfect and present some types of crystallographic disorder. A machine learning based Zeolite-Structure-Predictor (ZSP) model is developed to predict framework types for both near perfect and moderately disordered zeolite crystals. The ZSP uses various attributes, including topological descriptors based on a computational geometry approach and relevant physical, chemical properties of the crystals. Trained with 41 framework types, the ZSP can correctly classify zeolite crystals with over 98% accuracy. Additionally, it is shown that the ZSP model is able to predict the framework types for strongly disordered zeolite crystals with reliable success rate.
A new semi-blind ICA algorithm, named supervised ICA, is proposed for improving the performance of ICA in FECG extraction. A supervision signal is created for assisting the usual ICA algorithm to find the desired FECG...
详细信息
A new semi-blind ICA algorithm, named supervised ICA, is proposed for improving the performance of ICA in FECG extraction. A supervision signal is created for assisting the usual ICA algorithm to find the desired FECG unambiguously. It is different from other semi-blind ICA algorithms in that the a priori information about the temporal structure of the desired signal is encoded in the supervision signal, instead of the object function. Performances of the new algorithm are demonstrated by real data processing. High quality FECG can be extracted from the multi-channel abdominal signals.
A machine learning approach is applied to classify zeolite crystals according to their framework type. The Zeolite-Structure-Predictor is introduced based on the Random Forest algorithm. Zeolites structural data from ...
详细信息
ISBN:
(纸本)1601320620
A machine learning approach is applied to classify zeolite crystals according to their framework type. The Zeolite-Structure-Predictor is introduced based on the Random Forest algorithm. Zeolites structural data from the Inorganic Crystal Structure database (ICSD) are used to train the model. The ZSP uses sixteen attributes including topological descriptors obtained with statistical geometry and physical and chemical properties of individual zeolites. Trained with 40 framework types containing at least 5 instances per class, the ZSP can correctly classify zeolites with over 95% accuracy. The performance is shown to improve when more zeolite instances per class are available.
A letter to the editor is presented in response to an editorial in the March 2008 issue, containing the findings of a survey of papers from 20 journals from 2007.
A letter to the editor is presented in response to an editorial in the March 2008 issue, containing the findings of a survey of papers from 20 journals from 2007.
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the integrated microbial genome (IMG) syste...
详细信息
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the integrated microbial genome (IMG) system at the joint genome institute (JGI). For a genome sequencing center to provide multiple-genome comparison capabilities, it must keep pace with exponentially growing collection of sequence data, both from its own genomes, and from public genomes. We present an example of how ScalaBLAST, a high-throughput sequence analysis program, harnesses increasingly critical high-performance computing to perform sequence analysis, enabling, for example, all vs. all BLAST runs across 2 million protein sequences within a day using thousands of processors as opposed to conventional comparison methods that would take years to complete.
Recent research has demonstrated the utility of using supervised classification systems for automatic identification of low quality microarray data. However, this approach requires annotation of a large training set b...
详细信息
暂无评论