Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job au...
详细信息
Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflowprogramming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
Due to the significant growth of the demand for data-intensive computing, in addition to the emergence of new parallel and distributed computing technologies, scientists and domain experts are leveraging languages spe...
详细信息
ISBN:
(纸本)9789897582837
Due to the significant growth of the demand for data-intensive computing, in addition to the emergence of new parallel and distributed computing technologies, scientists and domain experts are leveraging languages specialized for their problem domain, i.e., domain-specific languages, to help them describe their problems and solutions, instead of using general purpose programming languages. The goal of these languages is to improve the productivity and efficiency of the development and simulation of concurrent scientific models and systems. Moreover, they help to expose parallelism and to specify the concurrency within a component or across different independent components. In this paper, we introduce the concept of domain-specific flow-based languages which allows domain experts to use flow-based languages adapted to a particular problem domain. flow-based programming is used to support concurrency, while the domain-specific part of these languages is used to define atomic processes and domain-specific validation rules for composite processes. We propose a modeling language that can be used to develop such domain-specific languages. Since this language allows one to define other languages, we often refer to it as a meta-modeling language.
A clever and efficient management of transport and logistics are fundamental in manufacturer companies, starting to adopt new methodologies, inspired to the emerging industry 4.0 principles. Such a behavior is influen...
详细信息
A clever and efficient management of transport and logistics are fundamental in manufacturer companies, starting to adopt new methodologies, inspired to the emerging industry 4.0 principles. Such a behavior is influenced by the spreading of the Internet of Things (IoT) paradigm, helping to automate a lot of features, if not all, of products' management, from raw materials' purchase order to the final delivery to customers. Small and medium industries must face design issues and noncustomized solutions may not fit with their habitual data flow. Hence, the need of a tool, able to support designers and developers in defining the network architecture and messages' exchange, emerges. To this end, the use of Node-RED, a flow-based programming tool for the IoT, is proposed, by providing a comprehensive case study targeted to smart transport and logistics.
The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high- throughput experimental methods...
详细信息
The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high- throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.
aFlux is a graphical flow-based programming tool designed to support the modelling of data analytics applications. It supports high-level programming of Big Data applications with early-stage flow validation and autom...
详细信息
aFlux is a graphical flow-based programming tool designed to support the modelling of data analytics applications. It supports high-level programming of Big Data applications with early-stage flow validation and automatic code generation for frameworks like Spark, Flink, Pig and Hive. The graphical programming concepts used in aFlux constitute the first approach towards supporting high-level Big Data application development by making it independent of the target Big Data frameworks. This programming at a higher level of abstraction helps to lower the complexity and its ensued learning curve involved in the development of Big Data applications.
Purpose - The purpose of this paper is to develop the model of telemetry data processing flow (TDPF) for TDPF development and the TDPF run-time infrastructure to improve the spacecraft health monitoring capability. De...
详细信息
Purpose - The purpose of this paper is to develop the model of telemetry data processing flow (TDPF) for TDPF development and the TDPF run-time infrastructure to improve the spacecraft health monitoring capability. Design/methodology/approach - This research tries to develop the TDPF by flow-based programming (FBP) method and the component-based telemetry data processing software. Findings - The result from the case study is positive, thus reflecting the appropriateness of the suggested method. Practical implications - Application of the proposed TDPF model and the component-based telemetry data processing software may result in improved development efficiency and less development costs. Originality/value - This paper provides an effective way to develop TDPF without recompiling the software. It greatly facilitates the TDPF development that hopefully will save the TDPF development cost.
Power-efficiency has been a key issue for today's application and system design, ranging from embedded systems to data centers. While application-specific designs and optimizations may improve the power efficiency...
详细信息
Power-efficiency has been a key issue for today's application and system design, ranging from embedded systems to data centers. While application-specific designs and optimizations may improve the power efficiency, it requires significant efforts to co-design the hardware and software, which are difficult to re-use. On the hardware front, the trend of heterogeneous computing enables custom designs for specific applications by integrating different types of processors and reconfigurable hardware to handle computeintensive tasks. However, what is still missing is an elegant application framework, i.e., a programming environment and a runtime system, to develop portable applications which can offload tasks or be reconfigured dynamically to run on a variety of systems efficiently. Our ongoing work, MobileFBP, provides an application framework which aims to support heterogeneous and reconfigurable systems. Using the framework, the developers build portable applications with a dataflowprogramming paradigm, and the MobileFBP runtime system dynamically schedules the task components to run on available computing resources locally or remotely based on the application profiles. We hope that this ability produces high-level portable applications and reduces the efforts and skills needed for the developers to optimize their applications on a range of systems. This paper describes this work and presents our preliminary results. (C) 2013 Elsevier B.V. All rights reserved.
Current architectural Design-Space Exploration (DSE) tools specify the exploration problem through annotations or pragmas. However, this approach is inherently language-dependent and limits the applicability to one sp...
详细信息
Current architectural Design-Space Exploration (DSE) tools specify the exploration problem through annotations or pragmas. However, this approach is inherently language-dependent and limits the applicability to one specific target language and synthesis toolchain. Additionally, the rapid development of new hardware Domain-Specific Languages, programming models, and different exploration heuristics calls for a language-agnostic and modular approach. To address this need, we present a DSE formalization to facilitate the integration of new components and customized flows and leverage it to implement Dflows, a flow-based-programming DSE tool that decouples problem definition, code generation, exploration, and evaluation strategies. Dflows’s compiler-based frontend provides language-agnostic generation of design points through Abstract Syntax Tree manipulation. We show how Dflows can integrate custom performance models from complex state-of-the-art accelerators for Verilog, VHDL, Chisel, and HLS designs. We compare the runtimes of our DSE process against a state-of-the-art Chisel-based DSE tool, achieving up to 3.74× speedup while identifying the same set of optimal solutions. Additionally, we integrate in Dflows a custom exploration heuristic leveraging genetic algorithms and a novel online learning fitness function approximation methodology. This approximation yields a negligible hypervolume difference with the exhaustive search Pareto-front while improving DSE runtime by up to 2.67×.
暂无评论