We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker contain...
详细信息
We present the BioDepot-workflow-builder (Bwb), a software tool that allows users to create and execute reproducible bioinformatics workflows using a drag-and-drop interface. Graphical widgets represent Docker containers executing a modular task. Widgets are linked graphically to build bioinformatics workflows that can be reproducibly deployed across different local and cloud platforms. Each widget contains a form-based user interface to facilitate parameter entry and a console to display intermediate results. Bwb provides tools for rapid customization of widgets, containers, and workflows. Saved workflows can be shared using Bwb's native format or exported as shell scripts.
Modern day analyses of biomedical data typically involves a series of computational tasks called workflows. Each of these modules could be written by different laboratories and thus, potentially requires different com...
详细信息
ISBN:
(纸本)9781728118673
Modern day analyses of biomedical data typically involves a series of computational tasks called workflows. Each of these modules could be written by different laboratories and thus, potentially requires different computing environments. Reproducible analyses of biomedical data is a growing field of interest [1]. This makes it important to develop tools to facilitate analyses in a reproducible manner. Software containers have been used to increase reproducibility in bioinformatics analyses [2]. However, these software containers often use command line tools, requiring expertise that make them inaccessible to many biomedical scientists. The BioDepot-workflow-builder (Bwb) enables bioinformatics research by providing a user-friendly interface to create, share and reproducibly execute workflows using graphically linked user-created widgets [3]. Each widget represents a Docker container, and can be linked to other widgets to produce a workflow while simultaneously creating a graphical representation of the pipeline. The use of software containers allow widgets to be shared between users while maintaining a uniform platform for recreating research results. Therefore, Bwb provides tools that eliminate issues in reproducing research based on different computing environments. This will facilitate collaboration between research teams and will allow readers to recreate the results of data analytics with confidence. bioinformatics workflows often involve the use of public datasets. Incorporating the data download step within the workflow inside Bwb ensures a common environment for analysis, removing variability in the data downloaded when reproducing results. In this work, we demonstrate the utility of accessing external data sources using a containerized environment by building a widget to download datasets from the Gene Expression Omnibus (GEO) database. We also illustrate the use of this widget in a workflow designed to identify differentially expressed genes from gene expr
Modern day analyses of biomedical data typically involves a series of computational tasks called workflows. Each of these modules could be written by different laboratories and thus, potentially requires different com...
详细信息
ISBN:
(纸本)9781728118680
Modern day analyses of biomedical data typically involves a series of computational tasks called workflows. Each of these modules could be written by different laboratories and thus, potentially requires different computing environments. Reproducible analyses of biomedical data is a growing field of interest. This makes it important to develop tools to facilitate analyses in a reproducible manner. Software containers have been used to increase reproducibility in bioinformatics analyses. However, these software containers often use command line tools, requiring expertise that make them inaccessible to many biomedical scientists. The BioDepot-workflow-builder (Bwb) enables bioinformatics research by providing a user-friendly interface to create, share and reproducibly execute workflows using graphically linked user-created widgets. Each widget represents a Docker container, and can be linked to other widgets to produce a workflow while simultaneously creating a graphical representation of the pipeline. The use of software containers allow widgets to be shared between users while maintaining a uniform platform for recreating research results. Therefore, Bwb provides tools that eliminate issues in reproducing research based on different computing environments. This will facilitate collaboration between research teams and will allow readers to recreate the results of data analytics with confidence. bioinformatics workflows often involve the use of public datasets. Incorporating the data download step within the workflow inside Bwb ensures a common environment for analysis, removing variability in the data downloaded when reproducing results. In this work, we demonstrate the utility of accessing external data sources using a containerized environment by building a widget to download datasets from the Gene Expression Omnibus (GEO) database. We also illustrate the use of this widget in a workflow designed to identify differentially expressed genes from gene expression Autho
Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential th...
详细信息
ISBN:
(纸本)9781467367752
Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well-established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward.
Cloud computing establishes a new computing model where a wide range of computing resources are provided to several types of users. Especially for bioinformatics experiments modeled as scientific workflows, clouds pro...
详细信息
ISBN:
(纸本)9781479955480
Cloud computing establishes a new computing model where a wide range of computing resources are provided to several types of users. Especially for bioinformatics experiments modeled as scientific workflows, clouds provide several types of resources as virtual machines (VM), storage, databases and computing power that can be combined for empowering the scientific workflow execution. These workflows usually require high performance environments and parallelism techniques since their activities are data and computing intensive and can execute for a long time. There are then some Scientific Workflow Management Systems (SWfMS) that already manage the parallel execution of scientific workflows in clouds. Most of them instantiate a virtual cluster for the execution. However, they rely on the user to estimate the amount of VMs to be instantiated to create this virtual cluster. Estimating the amount of VMs to instantiate is then a crucial task to avoid negative impacts on the workflow performance with under or over estimations. This dimensioning also is not a trivial task in clouds due to the large number of VM types to choose in a cloud provider. Previously proposed approach named GraspCC already provides a near optimal estimation of the amount of VM for general applications, not scientific workflows. In this paper, we coupled the GraspCC to SciCumulus (Cloud-based Parallel Engine for Scientific workflows) engine to estimate the necessary amount of VMs for bioinformatics workflows. We have evaluated GraspCC by comparing the estimative with real executions of a set of large-scale comparative genomics workflows. It showed the suitability of GraspCC to estimate the amount of VMs in real bioinformatics cloud workflows.
Seahawk is a browser for Moby Web services, which are online tools using a shared semantic registry and data formats. To make a wider array of tools available within Seahawk, the Daggoo system helps users adapt forms ...
详细信息
ISBN:
(纸本)9783642151194
Seahawk is a browser for Moby Web services, which are online tools using a shared semantic registry and data formats. To make a wider array of tools available within Seahawk, the Daggoo system helps users adapt forms on existing Web sites to Moby's specifications. Biologists were interviewed and given workflow design tasks, which revealed the types of tools present in their conceptual analysis workflows, and the types of control flow they understood. These observations were used to enhance Seahawk so that Moby and external Web tools can be browsed to create workflows "by demonstration". A flow-up user study measured how effectively biologists could 1) demonstrate a workflow for a realistic task, 2) understand the automatically generated workflow, and 3) use the workflow in the Taverna workflow editor/enactor. The results show promise that biologists without programming experience can become self-sufficient in analysis automation, using workflow-by-demonstration as a first step.
The field of bioinformatics involves analysis of large sets of data. This might entail leveraging of tools scattered over many Web sites. To provide the experimental biologists with a common platform capable of such a...
详细信息
The field of bioinformatics involves analysis of large sets of data. This might entail leveraging of tools scattered over many Web sites. To provide the experimental biologists with a common platform capable of such analysis, this thesis focuses on extending a bioinformatics framework with Web service invocation support. Galaxy being substantially popular for its analysis tools and workflow management capability seemed like an ideal candidate to extend. This thesis proposes adding REST Web service support to Galaxy in a way that can be easily extended to SOAP Web services in the future. Also, it introduces an approach to add dynamic tools to Galaxy. To simplify the process of repetitive analyses on different sets of data, in this thesis we discuss enabling Web service invocation in the workflow portion of Galaxy. Also this thesis shows how we can leverage semantic annotations in Web services to improve the user?s experience when interacting with Web services
This paper shows how the agility provided by the Bio-jETI platform helps to interactively design bioinformatics analysis processes. Bio-jETI is a platform for the integration, orchestration and provision of services. ...
详细信息
ISBN:
(纸本)9783540794493
This paper shows how the agility provided by the Bio-jETI platform helps to interactively design bioinformatics analysis processes. Bio-jETI is a platform for the integration, orchestration and provision of services. The agility in design and execution is demonstrated by developing seven variations on a multiple sequence alignment workflow.
暂无评论