scientific workflows have emerged as an important tool for combining computational power with data analysis for all scientific domains in e-science. They help scientists to design and execute complex in silico experim...
详细信息
scientific workflows have emerged as an important tool for combining computational power with data analysis for all scientific domains in e-science. They help scientists to design and execute complex in silico experiments. However, with increasing complexity it becomes more and more infeasible to optimize scientific workflows by trial and error. To address this issue, this paper describes the design of a new optimization phase integrated in the established scientific workflow life cycle. We have also developed a flexible optimization application programming interface (API) and have integrated it into a scientific workflow management system. A sample plugin for parameter optimization based on genetic algorithms illustrates, how the API enables rapid implementation of concrete workflow optimization methods. Finally, a use case taken from the area of structural bioinformatics validates how the optimization approach facilitates setup, execution and monitoring of workflow parameter optimization in high performance computing e-science environments.
Relation extraction is frequently and successfully addressed by machine learning methods. The downside of this approach is the need for annotated training data, typically generated in tedious manual, cost intensive wo...
详细信息
ISBN:
(纸本)9781622764907
Relation extraction is frequently and successfully addressed by machine learning methods. The downside of this approach is the need for annotated training data, typically generated in tedious manual, cost intensive work. Distantly supervised approaches make use of weakly annotated data, like automatically annotated corpora. Recent work in the biomedical domain has applied distant supervision for protein-protein interaction (PPI) with reasonable results making use of the IntAct database. Such data is typically noisy and heuristics to filter the data are commonly applied. We propose a constraint to increase the quality of data used for training based on the assumption that no self-interaction of real-world objects are described in sentences. In addition, we make use of the University of Kansas Proteomics Service (KUPS) database. These two steps show an increase of 7 percentage points (pp) for the PPI corpus AIMed. We demonstrate the broad applicability of our approach by using the same workflow for the analysis of drug-drug interactions, utilizing relationships available from the drug database DrugBank. We achieve 37.31 % in F_1 measure without manually annotated training data on an independent test set.
In many numerical simulation codes the backbone of the application covers the solution of linear systems of equations. Often, being created via a discretization of differential equations, the corresponding matrices ar...
详细信息
Incorporating distant information via manually selected skip chain templates has been shown to be beneficial for the performance of conditional random field models in contrast to a simple linear chain based structure ...
详细信息
Incorporating distant information via manually selected skip chain templates has been shown to be beneficial for the performance of conditional random field models in contrast to a simple linear chain based structure (Sutton and McCallum, 2007;Galley, 2006;Liu et al., 2010). The set of properties to be captured by a template is typically manually chosen with respect to the application domain. In this paper, a search strategy to find meaningful skip chains independent from the application domain is proposed. From a huge set of potentially beneficial templates, some can be shown to have a positive impact on the performance. The search for a meaningful graphical structure demonstrates the usefulness of the approach with an increase of nearly 2% F1 measure on a publicly available data set (Klinger et al., 2008).
We overview the methods for nonlinear metamodeling of a simulation database featuring continuous exploration of simulation results, tolerance prediction, sensitivity analysis, robust multiobjective optimization and ra...
详细信息
ISBN:
(纸本)9789898425782
We overview the methods for nonlinear metamodeling of a simulation database featuring continuous exploration of simulation results, tolerance prediction, sensitivity analysis, robust multiobjective optimization and rapid interpolation of bulky FEM data. Large scatter of simulation results, in crash-test simulations caused for example by buckling, is still a challenging issue for increasing predictability of simulation and accuracy of optimization results. For industrially relevant simulations with large scatter, novel stochastic methods are introduced and their efficiency is demonstrated for benchmark cases.
Hydraulic axial pumps equipped with cam-driven commutation unit (PWK pumps) proved their high efficiency up to 55 MPa and ability to work self-sucking, even at high speed. Displacement of PWK pump may easily be change...
详细信息
As scientific workflows are becoming more complex and apply compute-intensive methods to increasingly large data volumes, access to HPC resources is becoming mandatory. We describe the development of a novel plug in f...
详细信息
As scientific workflows are becoming more complex and apply compute-intensive methods to increasingly large data volumes, access to HPC resources is becoming mandatory. We describe the development of a novel plug in for the Tavern a workflow system, which provides transparent and secure access to HPC/grid resources via the UNICORE grid middleware, while maintaining the ease of use that has been the main reason for the success of scientific workflow systems. A use case from the bioinformatics domain demonstrates the potential of the UNICORE plug in for Tavern a by creating a scientific workflow that executes the central parts in parallel on a cluster resource.
Aircraft Environmental Control Systems (ECS) are designed to optimize passenger comfort by providing satisfactory cabin pressurization, and temperature and humidity control whilst minimising the risks to passenger hea...
详细信息
In the last decade, life science applications have become more and more integrated into e-Science environments, hence they are typically very demanding, both in terms of computational capabilities and data capacities....
详细信息
In the last decade, life science applications have become more and more integrated into e-Science environments, hence they are typically very demanding, both in terms of computational capabilities and data capacities. Especially the access to life science applications, embedded in such environments via Grid clients still constitutes a major hurdle for scientists that do not have an IT background. Life science applications often comprise a whole set of small programs instead of a single executable. Many of the graphical Grid clients are not perfectly suited for these types of applications, as they often assume that Grid jobs will run a single executable instead of a set of chained executions (i.e. sequences). This means that in order to execute a sequence of multiple programs on a single Grid resource, piping data from one program to the next, the user would have to run a hand-written shell script. Otherwise each program is independently scheduled as a Grid job, which causes unnecessary file transfers between the jobs, even if they are scheduled on the same resource. We present a generic solution to this problem and provide a reference implementation, which seamlessly integrates with the Grid middleware UNICORE. Our approach focuses on a comfortable user interface for the creation of such program sequences, validated in UNICORE-driven HPC-based Grids. Thus, we applied our approach in order to provide support for the usage of the AMBER package (a widely-used collection of programs for molecular dynamics simulations) within Grid workflows. We finally provide a scientific use case of our approach leveraging the interoperability of two different scientific infrastructures that represents an instance of the infrastructure interoperability reference model.
暂无评论