With Internet of Things (IoT) technology, home environment becomes smarter than ever. Not only smart devices such as smart phone or smart TV, but also various IoT devices including sensor, smart thermostat, and smart ...
详细信息
With Internet of Things (IoT) technology, home environment becomes smarter than ever. Not only smart devices such as smart phone or smart TV, but also various IoT devices including sensor, smart thermostat, and smart scale has now become very common on the market. These devices have connectivity to the Internet, so that user can read data from the device or control the device using Internet technology. However, due to diversity of smart home requirements, device collaboration in smart home remains a challenging task still. Usually smart home is built with various technologies to fulfill its own purpose, and these purposes cover very wide area from controlling low-power sensor devices to controlling high-performance devices like smart TV and smart phone. This variety of smart home requirements makes smart home very complicated due to mixed network architecture, protocol and technology. In this paper, a framework to enable managing and collaborating heterogeneous IoT devices in smart home environment is proposed. Several programming models are defined in the proposed framework to make application development for heterogeneous devices more intuitive. The proposed framework has been implemented as a web service, and a case study with real-world smart home IoT devices is presented.
Computational grids have an enormous potential to provide compute power. However, this power remains largely unexploited today for most applications, except trivially parallel programs. Developing parallel grid applic...
详细信息
Computational grids have an enormous potential to provide compute power. However, this power remains largely unexploited today for most applications, except trivially parallel programs. Developing parallel grid applications simply is too difficult. Grids introduce several problems not encountered before, mainly due to the highly heterogeneous and dynamic computing and networking environment. Furthermore, failures occur frequently, and resources may be claimed by higher-priority jobs at any time. In this article, we solve these problems for an important class of applications: divide-and-conquer. We introduce a system called Satin that simplifies the development of parallel grid applications by providing a rich high-level programming model that completely hides communication. All grid issues are transparently handled in the runtime system, not by the programmer. Satin's programming model is based on Java, features spawn-sync primitives and shared objects, and uses asynchronous exceptions and an abort mechanism to support speculative parallelism. To allow an efficient implementation, Satin consistently exploits the idea that grids are hierarchically structured. Dynamic load-balancing is done with a novel cluster-aware scheduling algorithm that hides the long wide-area latencies by overlapping them with useful local work. Satin's shared object model lets the application define the consistency model it needs. If an application needs only loose consistency, it does not have to pay high performance penalties for wide-area communication and synchronization. We demonstrate how grid problems such as resource changes and failures can be handled transparently and efficiently. Finally, we show that adaptivity is important in grids. Satin can increase performance considerably by adding and removing compute resources automatically, based on the application's requirements and the utilization of the machines and networks in the grid. Using an extensive evaluation on real grids with
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS ...
详细信息
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the three programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.
in a multiple-criteria decision analysis (MCDA) problem, qualitative information with subjective judgments of ambiguity is often provided by people, together with quantitative data that may also be imprecise or incomp...
详细信息
in a multiple-criteria decision analysis (MCDA) problem, qualitative information with subjective judgments of ambiguity is often provided by people, together with quantitative data that may also be imprecise or incomplete. There are several uncertainties that may be considered in an MCDA problem, such as fuzziness and ambiguity. The evidential reasoning (ER) approach is well suited for dealing with such MCDA problems and can generate comprehensive distributed assessments for different alternatives. Many researches in dealing with imprecise or uncertain belief structures have been conducted on the ER approach. In this paper, both triangular fuzzy weights of criteria and fuzzy utilities assigned to evaluation grades are introduced to the ER approach, which may be incurred in several circumstances such as group decision-making situation. The Hadamard multiplicative combination of judgment matrix is extended for the aggregation of triangular fuzzy judgment matrices, the result of which is applied as the fuzzy weights used in the fuzzy ER approach. The consistency of the aggregated triangular fuzzy judgment matrix is also proved. Several pairs of ER-based programming models are designed to generate the total fuzzy belief degrees and the overall expected fuzzy utilities for the comparison of alternatives. A numerical example is conducted to show the effectiveness of the proposed approach. (C) 2009 Wiley Periodicals, Inc.
Based on the illustration of the importance of the coordination between forest and socio-economy, a dynamic programming model aimed at the coordination is created and the basic characters of the functions in the model...
详细信息
ISBN:
(纸本)9780878492459
Based on the illustration of the importance of the coordination between forest and socio-economy, a dynamic programming model aimed at the coordination is created and the basic characters of the functions in the model are discussed. A solution under a simple and ideal situation is studied, which results in the equation of the economic yield when forest coordinates with socio-economy and several other conclusions.
High performance computing with low cost machines becomes a reality with GPU. Unfortunately, high performances are achieved when the programmer exploits the architectural specificities of the GPU prefetching: he has t...
详细信息
High performance computing with low cost machines becomes a reality with GPU. Unfortunately, high performances are achieved when the programmer exploits the architectural specificities of the GPU prefetching: he has to focus on inter-GPU communications, task allocations among the GPUs, task scheduling, external memory prefetching, and synchronization. In this paper, we propose and evaluate a compile flow. It automates the transformation of a program expressed with the high level system design language SystemC, to its implementation on a cluster of multi-GPU. SystemC constructs and schedualer are directly mapped to the GPU API, preserving their semantic. Inter-GPU communications are abstracted by means of SystemC channels. (C) 2010 Published by Elsevier Ltd.
Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on a node with such an architecture to achieve hi...
详细信息
ISBN:
(纸本)9781479986484
Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on a node with such an architecture to achieve high parallel efficiency is a major challenge. In this paper, we focus on developing a system for heterogeneous graph processing, which is able to utilize both a many-core Xeon Phi and a multi-core CPU on one node. We propose a simple programming API with an intuitive interface for expressing SIMD parallelism. We develop efficient techniques for supporting our high-level API, focusing on exploiting wide SIMD lanes, massive number of cores, and partitioning of the work across CPU and accelerator, while handling the irregularity of graph applications. The components of our runtime system include a condensed static memory buffer, which supports efficient message insertion and SIMD message reduction while keeping memory requirements low, and specifically for MIC, a pipelining scheme for efficient message generation by avoiding frequent locking operations. Besides, a hybrid graph partitioning module is able to effectively partition the workload between the CPU and the MIC, ensuring balanced workload and low communication overhead. The main observations from our experimental evaluation using five popular applications are: for MIC executions, pipelining scheme is up to 3.36x faster than a naive approach using locking based message generation, and the speedup over OpenMP ranges from 1.17 to 4.15. Heterogeneous CPU-MIC execution achieves a speedup of up to 1.41 over the better of the CPU-only and MIC-only executions.
"Explicit concurrency should be abolished from all higher-level programming languages (i.e. everything except - perhaps-plain machine code.)." Dijkstra [1] (paraphrased). A promising class of concurrency abs...
详细信息
ISBN:
(纸本)9781479919345
"Explicit concurrency should be abolished from all higher-level programming languages (i.e. everything except - perhaps-plain machine code.)." Dijkstra [1] (paraphrased). A promising class of concurrency abstractions replaces explicit concurrency mechanisms with a single linguistic mechanism that combines state and control and uses asynchronous messages for communications, e.g. active objects or actors, but that doesn't remove the hurdle of understanding non-local control transfer. What if the programming model enabled programmers to simply do what they do best, that is, to describe a system in terms of its modular structure and write sequential code to implement the operations of those modules and handles details of concurrency? In a recently sponsored NSF project we are developing such a model that we call capsule-oriented programming and its realization in the Panini project. This model favors modularity over explicit concurrency, encourages concurrency correctness by construction, and exploits modular structure of programs to expose implicit concurrency.
Embedded System toolchains are highly customized for a specific System-on-Chip (SoC). When the application needs more performance, the designer is typically forced to adopt a new SoC and possibly another toolchain. Th...
详细信息
ISBN:
(纸本)9781467382991
Embedded System toolchains are highly customized for a specific System-on-Chip (SoC). When the application needs more performance, the designer is typically forced to adopt a new SoC and possibly another toolchain. The rationale for not scaling performance by using, e.g., two SoCs, is that maintining most of the operations on-chip may allow for higher energy efficiency. We are exploring the feasibility and trade-offs of designing and manufacturing a new Single Board Computer (SBC) that could serve flexibly for a number of current and future applications, by allowing scalability through clusters of SBCs while keeping the same programming model for the SBC. This board is based on FPGAs and embedded processors, and its key points are: i) a fast custom interconnect for board-to-board communication and ii) an easily programmable environment which would allow both the off-loading of code into accelerators (either soft-IP blocks or hard-IP blocks) and, at the same time, the distribution of computation across boards. A key challenge to successfully deploying this paradigm is to properly distribute the threads across several boards without the explicit intervention of the programmer. In this paper we describe how to dynamically and efficiently distribute the computational threads in symbiosis with an appropriate memory model to allow the system scalability, so that we can double the performance by simply connecting two boards without i) changing the basic hardware components (e.g., to a different System-On-Chip) and ii) changing the programming model to follow the vendor specific toolchain. Our approach is to reduce data movement across boards. Our initial experiments have confirmed the feasibility of our approach.
Coarray Fortran (CAF) is a parallel programming paradigm that extends Fortran for the partitioned global address space (PGAS) programming model at the language level. The current runtime implementations of CAF are mai...
详细信息
ISBN:
(纸本)9781467376846
Coarray Fortran (CAF) is a parallel programming paradigm that extends Fortran for the partitioned global address space (PGAS) programming model at the language level. The current runtime implementations of CAF are mainly using MPI or GASNet as underlying communication components. MVAPICH2-X is a hybrid MPI+PGAS programming library with a Unified Communication Runtime (UCR) design. In this paper, the classic implementation of CAF runtime in OpenUH is redesigned and rebuilt on top of MVAPICH2-X. The proposed design does not only enable the support of MPI+CAF hybrid programming model, but also provides superior performance on most of the CAF one-sided operations and the newly proposed collective operations in Fortran 2015 specification. A comprehensive evaluation with different benchmarks and applications has been performed. Comparing with current GASNet-based solutions, the CAF runtime with MVAPICH2-X can improve the bandwidths of put and bidirectional operations up to 3.5X for inter-node communication, and improve the bandwidths of collective communication operations represented by broadcast up to 3.0X on 64 processes. It also reduces the execution time of NPB CAF benchmarks by up to 18% on 256 processes.
暂无评论