Block and data-flow programming are well-known concepts for enabling end users to visually create their own customized solutions. They both offer comprehensive visual interfaces and are becoming popular within the sma...
详细信息
ISBN:
(纸本)9789897585722
Block and data-flow programming are well-known concepts for enabling end users to visually create their own customized solutions. They both offer comprehensive visual interfaces and are becoming popular within the smart spaces domain. However, there is currently no systematic, comparative evaluation of the two concepts in the domain. In this user study, we implemented two prototypes for block and data-flow programming and compared their performance on typical usage scenarios in common smart spaces. We used a mixed methods approach of quantitative and qualitative analysis to gain an in-depth understanding of the user experience. Our results show that data-flow programming is overall better received by users than block programming and is considered being state-of-the art and visually more appealing. For block programming, our results reveal that participants appreciate the playful character and the feedback provided. Our study contributes to the improvement of block and data-flow solutions in place and discusses aspects relevant to the future advancement of end-user development in smart spaces.
Visual programming languages are a common way for enabling end users to create simple, customized programs in their smart homes. Although visual programming aims at lowering the syntactic and cognitive barriers existi...
详细信息
ISBN:
(纸本)9781728169019
Visual programming languages are a common way for enabling end users to create simple, customized programs in their smart homes. Although visual programming aims at lowering the syntactic and cognitive barriers existing languages lack a simple method for accessing services of smart spaces surrounding the home and therefore restrain the innovative capabilities of users. For example, accessing information about energy consumption or smart parking is often cumbersome and limited to users with advanced technical skills. Based on expert interviews, we extended the existing visual programming language Node-RED with functionalities that enable users to easily access data and services within and beyond their smart homes. Using this prototype, we performed a first evaluation of our solution with some inhabitants of smart homes.
The formulation of quantum programs in terms of the fewest number of gate operations is crucial to retrieve meaningful results from the noisy quantum processors accessible these days. In this work, we demonstrate a us...
详细信息
The formulation of quantum programs in terms of the fewest number of gate operations is crucial to retrieve meaningful results from the noisy quantum processors accessible these days. In this work, we demonstrate a use-case for Field Programmable Gate Array (FPGA) based data-flow engines (DFEs) to scale up variational quantum compilers to synthesize circuits up to 9-qubit programs. This gate decomposer utilizes a newly developed DFE quantum computer simulator that is designed to simulate arbitrary quantum circuit consisting of single qubit rotations and controlled two-qubit gates on FPGA chips. In our benchmark with the QISKIT package, the depth of the circuits produced by the SQUANDER package (with the DFE accelerator support) were less by 97% on average, while the fidelity of the circuits was still close to unity up to an error of similar to 10-4.
data-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parall...
详细信息
data-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer from overheads related to task creation, scheduling and dependency management, limiting performance and scalability when tasks become too small. At the same time, many HPC applications implement iterative methods or multi-step simulations that create the same directed acyclic graphs of tasks on each iteration. By giving application programmers a way to express that a specific loop is creating the same task pattern on each iteration, we can create a single task directed acyclic graph (DAG) once and transform it into a cyclic graph. This cyclic graph is then reused for successive iterations, minimizing task creation and dependency management overhead. This paper presents the taskiter, a new construct we propose for the OmpSs-2 and OpenMP programming models, allowing the use of directed cyclic task graphs (DCTG) to minimize runtime overheads. Moreover, we present a simple immediate successor locality-aware heuristic that minimizes task scheduling overhead by bypassing the runtime task scheduler. We evaluate the implementation of the taskiter and the immediate successor heuristic in 8 iterative benchmarks. Using small task granularities, we obtain a geometric mean speedup of 2.56x over the reference OmpSs-2 implementation, and a 3.77x and 5.2x speedup over the LLVM and GCC OpenMP runtimes, respectively.
In recent years, the areas of High-Performance Computing (HPC) and massive data processing (also know as Big data) have been in a convergence course, since they tend to be deployed on similar hardware. HPC systems hav...
详细信息
In recent years, the areas of High-Performance Computing (HPC) and massive data processing (also know as Big data) have been in a convergence course, since they tend to be deployed on similar hardware. HPC systems have historically performed well in regular, matrix-based computations;on the other hand, Big data problems have often excelled in fine-grained, data parallel workloads. While HPC programming is mostly task-based, like COMPSs, popular Big data environments, like Spark, adopt the functional programming paradigm. A careful analysis shows that there are pros and cons to both approaches, and integrating them may yield interesting results. With that reasoning in mind, we have developed DDF, an API and library for COMPSs that allows developers to use Big data techniques while using that HPC environment. DDF has a functional-based interface, similar to many data Science tools, that allows us to use dynamic evaluation to adapt the task execution in run time. It brings some of the qualities of Big dataprogramming, making it easier for application domain experts to write data Analysis jobs. In this article we discuss the API and evaluate the impact of the techniques used in its implementation that allow a more efficient COMPSs execution. In addition, we present a performance comparison with Spark in several application patterns. The results show that each technique significantly impacts the performance, allowing COMPSs to outperform Spark in many use cases. (C) 2021 Elsevier Inc. All rights reserved.
Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We sho...
详细信息
ISBN:
(纸本)9781450341219
Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform abstraction of both computing and memory resources for task-parallel programming models while achieving high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system and placement information from the operating system. We achieve 94% of local memory accesses on a 192-core system with 24 NUMA nodes, up to 5x higher performance than NUMA-aware hierarchical work stealing, and even 5.6x compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications.
Bounded single-producer single-consumer FIFO queues are one of the simplest concurrent data-structure, and they do not require more than sequential consistency for correct operation. Still, sequential consistency is a...
详细信息
ISBN:
(纸本)9781479929276
Bounded single-producer single-consumer FIFO queues are one of the simplest concurrent data-structure, and they do not require more than sequential consistency for correct operation. Still, sequential consistency is an unrealistic hypothesis on shared-memory multiprocessors, and enforcing it through memory barriers induces significant performance and energy overhead. This paper revisits the optimization and correctness proof of bounded FIFO queues in the context of weak memory consistency, building upon the recent axiomatic formalization of the C11 memory model. We validate the portability and performance of our proven implementation over 3 processor architectures with diverse hardware memory models, including ARM and PowerPC. Comparison with state-of-the-art implementations demonstrate consistent improvements for a wide range of buffer and batch sizes.
Recent FPGAs allow the design of efficient and complex Heterogeneous Systems-on-Chip (HSoC). Namely, these systems are composed of several processors, hardware accelerators as well as communication media between all t...
详细信息
Recent FPGAs allow the design of efficient and complex Heterogeneous Systems-on-Chip (HSoC). Namely, these systems are composed of several processors, hardware accelerators as well as communication media between all these components. Performances provided by HSoCs make them really interesting for data-flow applications, especially image processing applications. The use of this kind of architecture provides good performances but the drawback is an increase of the programming complexity. This complexity is due to the heterogeneous deployment of the application on the platform. Some functions are implemented in software to run on a processor, whereas other functions are implemented in hardware to run in a reconfigurable partition of the FPGA. This article aims to define a programming model based on the Synchronous data-flow model, in order to abstract the heterogeneity of the implementation and to leverage the communication issue between software and hardware actors.
Recent FPGAs allow the design of efficient and complex Heterogeneous Systems-on-Chip (HSoC). Namely, these systems are composed of several processors, hardware accelerators as well as communication media between all t...
详细信息
Recent FPGAs allow the design of efficient and complex Heterogeneous Systems-on-Chip (HSoC). Namely, these systems are composed of several processors, hardware accelerators as well as communication media between all these components. Performances provided by HSoCs make them really interesting for data-flow applications, especially image processing applications. The use of this kind of architecture provides good performances but the drawback is an increase of the programming complexity. This complexity is due to the heterogeneous deployment of the application on the platform. Some functions are implemented in software to run on a processor, whereas other functions are implemented in hardware to run in a reconfigurable partition of the FPGA. This article aims to define a programming model based on the Synchronous data-flow model, in order to abstract the heterogeneity of the implementation and to leverage the communication issue between software and hardware actors.
data-flow has proven to be an attractive computation model for programming digital signal processing (DSP) applications. A restricted version of data-flow, termed synchronous data-flow (SDF), offers strong compile-tim...
详细信息
ISBN:
(纸本)9781424469673
data-flow has proven to be an attractive computation model for programming digital signal processing (DSP) applications. A restricted version of data-flow, termed synchronous data-flow (SDF), offers strong compile-time predictability properties, but has limited expressive power. A new type of hierarchy (Interface-based SDF) has been proposed allowing more expressivity while maintaining its predictability. One of the main problems with this hierarchical SDF model is the lack of trade-off between parallelism and network clustering. This paper presents a systematic method for applying an important class of loop transformation techniques in the context of interface-based SDF semantics. The resulting approach provides novel capabilities for integrating parallelism extraction properties of the targeted loop transformations with the useful modeling, analysis, and code reuse properties provided by SDF.
暂无评论