Component-based development is one of the core principles behind modern software engineering practices. Understanding of causal relationships between components of a software system can yield significant benefits to d...
详细信息
ISBN:
(纸本)9798350301137
Component-based development is one of the core principles behind modern software engineering practices. Understanding of causal relationships between components of a software system can yield significant benefits to developers. Yet modern software design approaches make it difficult to track and discover such relationships at system scale, which leads to growing intellectual debt. In this paper we consider an alternative approach to software design, flow-based programming (FBP), and draw the attention of the community to the connection between dataflow graphs produced by FBP and structural causal models. With expository examples we show how this connection can be leveraged to improve day-to-day tasks in software projects, including fault localisation, business analysis and experimentation.
The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high- throughput experimental methods...
详细信息
The pharmaceutical industry is facing a research and development productivity crisis. At the same time we have access to more biological data than ever from recent advancements in high- throughput experimental methods. One suggested explanation for this apparent paradox has been that a crisis in reproducibility has affected also the reliability of datasets providing the basis for drug development. Advanced computing infrastructures can to some extent aid in this situation but also come with their own challenges, including increased technical debt and opaqueness from the many layers of technology required to perform computations and manage data. In this thesis, a number of approaches and methods for dealing with data and computations in early drug discovery in a reproducible way are developed. This has been done while striving for a high level of simplicity in their implementations, to improve understandability of the research done using them. based on identified problems with existing tools, two workflow tools have been developed with the aim to make writing complex workflows particularly in predictive modelling more agile and flexible. One of the tools is based on the Luigi workflow framework, while the other is written from scratch in the Go language. We have applied these tools on predictive modelling problems in early drug discovery to create reproducible workflows for building predictive models, including for prediction of off-target binding in drug discovery. We have also developed a set of practical tools for working with linked data in a collaborative way, and publishing large-scale datasets in a semantic, machine-readable format on the web. These tools were applied on demonstrator use cases, and used for publishing large-scale chemical data. It is our hope that the developed tools and approaches will contribute towards practical, reproducible and understandable handling of data and computations in early drug discovery.
The thermal shift assay (TSA)-also known as differential scanning fluorimetry (DSF), thermofluor, and T-m shift-is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (FBLD...
详细信息
The thermal shift assay (TSA)-also known as differential scanning fluorimetry (DSF), thermofluor, and T-m shift-is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (FBLD) to detect protein-ligand interactions. By comparing the thermal stability of a target protein in the presence and absence of a ligand, potential binders can be identified. The technique is easy to set up, has low protein consumption, and can be run on most real-time polymerase chain reaction (PCR) instruments. While data analysis is straightforward in principle, it becomes cumbersome and time-consuming when the screens involve multiple 96- or 384-well plates. There are several approaches that aim to streamline this process, but most involve proprietary software, programming knowledge, or are designed for specific instrument output files. We therefore developed an analysis workflow implemented in the Konstanz Information Miner (KNIME), a free and open-source data analytics platform, which greatly streamlined our data processing timeline for 384-well plates. The implementation is code-free and freely available to the community for improvement and customization to accommodate a wide range of instrument input files and workflows.
Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job au...
详细信息
Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflowprogramming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
This article presents the DEMOS prototype platform for creating and exploring multimodal extended-reality smart environments. Modular distributed event-driven applications are created with the help of visual codeless ...
详细信息
This article presents the DEMOS prototype platform for creating and exploring multimodal extended-reality smart environments. Modular distributed event-driven applications are created with the help of visual codeless design tools for configuring and linking processing nodes in an oriented dataflow graph. We tested the conceptual logical templates by building two applications that tackle driver arousal state for safety and enhanced museum experiences for cultural purposes, and later by evaluating programmer and nonprogrammer students' ability to use the design logic. The applications involve formula-based and decision-based processing of data coming from smart sensors, web services, and libraries. Interaction patterns within the distributed event-driven applications use elements of mixed reality and the Internet of Things, creating an intelligent environment based on near-field communication-triggering points. We discuss the platform as a solution to bridging the digital divide, analyzing novel technologies that support the development of a sustainable digital ecosystem.
In this paper, we introduced a Hybrid Data-flow Visual Programing Language (HDVPL), which is an extended C/C++ language with a visual frontend and a dataflow runtime library. Although, most of the popular dataflow vis...
详细信息
ISBN:
(纸本)9781450394451
In this paper, we introduced a Hybrid Data-flow Visual Programing Language (HDVPL), which is an extended C/C++ language with a visual frontend and a dataflow runtime library. Although, most of the popular dataflow visual programming languages are designed for specialized purposes, HDVPL is for general-purpose programming. Unlike the others, the dataflow node behavior of HDVPL can be customized by programmer. Our intuitive visual interface can easily build a general-purpose dataflow program. It provides a visual editor to create nodes and connect them to form a DAG of dataflow task. This makes the beginner of computer programming capable of building parallel programs easily. With subgraph feature, complex hierarchical graphs can be built with container node. After the whole program is accomplished, the HDVPL can translate it into text-based source code and compile it into object file, which will be linked with HDVPL dataflow runtime library. To visualize dataflow programs in runtime, we integrated our dataflow library with frontend visual editor. The visual frontend will show the detailed information about the running program in console window.
A clever and efficient management of transport and logistics are fundamental in manufacturer companies, starting to adopt new methodologies, inspired to the emerging industry 4.0 principles. Such a behavior is influen...
详细信息
A clever and efficient management of transport and logistics are fundamental in manufacturer companies, starting to adopt new methodologies, inspired to the emerging industry 4.0 principles. Such a behavior is influenced by the spreading of the Internet of Things (IoT) paradigm, helping to automate a lot of features, if not all, of products' management, from raw materials' purchase order to the final delivery to customers. Small and medium industries must face design issues and noncustomized solutions may not fit with their habitual data flow. Hence, the need of a tool, able to support designers and developers in defining the network architecture and messages' exchange, emerges. To this end, the use of Node-RED, a flow-based programming tool for the IoT, is proposed, by providing a comprehensive case study targeted to smart transport and logistics.
aFlux is a graphical flow-based programming tool designed to support the modelling of data analytics applications. It supports high-level programming of Big Data applications with early-stage flow validation and autom...
详细信息
aFlux is a graphical flow-based programming tool designed to support the modelling of data analytics applications. It supports high-level programming of Big Data applications with early-stage flow validation and automatic code generation for frameworks like Spark, Flink, Pig and Hive. The graphical programming concepts used in aFlux constitute the first approach towards supporting high-level Big Data application development by making it independent of the target Big Data frameworks. This programming at a higher level of abstraction helps to lower the complexity and its ensued learning curve involved in the development of Big Data applications.
暂无评论