Be aware enough of the fact that performance gap between CPU and memory, employing novel memory techniques in embedded systems is a feasible way to reduce the performance gap. For MPSoC which is equipped with SRAM and...
详细信息
ISBN:
(纸本)9781538637906
Be aware enough of the fact that performance gap between CPU and memory, employing novel memory techniques in embedded systems is a feasible way to reduce the performance gap. For MPSoC which is equipped with SRAM and STT-RAM based hybrid SPMs, data can be effectively parallel accessed. This paper explores data allocation, task assignment and scheduling on MPSoC with SRAM and STT-RAM based hybrid SPMs. We proposed a mixed integer quadratically constrained program(MIQCP) formulations and a heuristic method (HA) to generates optimal and near optimal data allocation, task assignment, and scheduling solution. Experimental results show that MIQCP and HA can reduce 32.6% and 20.1% schedule length on average.
Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have ...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have been developed to mitigate the influence of faults to the system. In this paper, we investigate the message overhead for fault detection monitoring with decentralized Fault Detection Units in a unified 2D-mesh and assess the resulting delays of application messages. We investigate routing algorithms for different message types and demonstrate 19% reduction of the impact of fault detection messages on application messages. We also show the limitations of prioritized fault detection messages for different application message packet injection rates.
In this work a new implementation is presented based on Common Object Request Broker Architecture (CORBA) to develop parallelapplications. This approach is based on asynchronous messaging, using Asynchronous Messagin...
详细信息
ISBN:
(纸本)1932415262
In this work a new implementation is presented based on Common Object Request Broker Architecture (CORBA) to develop parallelapplications. This approach is based on asynchronous messaging, using Asynchronous Messaging Invocation (AMI) models, along with an extension of the Implementation Repository to manage and distribute remote processes. Two interesting characteristics are suported here: first, it allows the building of CORBA applications with parallel resources and suitable performance;and second, it provides interoperability among different Object Request Brokers (ORBs) and applications so that the structure of CORBA architecture remains unmodified.
With the increasing number of cores in modern systems. dynamic concurrency throttling (DCT) and turbo-boosting techniques are becoming a solution to better use the hardware resources. While DCT techniques tune the num...
详细信息
ISBN:
(纸本)9781665414555
With the increasing number of cores in modern systems. dynamic concurrency throttling (DCT) and turbo-boosting techniques are becoming a solution to better use the hardware resources. While DCT techniques tune the number of running threads. boosting techniques speed up sequential phases or unbalanced threads. However. as each region of an application may behave differently, optimizing both knobs is not straightforward. Hence, we propose two strategies that apply DCT and turbo-bosting: DBF, which aims to find an ideal configuration for each parallel/sequential region, and DBC, which considers the combination of parallel/sequential regions during the optitnization. We show that DBF and DBC improve the EDP by up to 19% and 27% compared to a DCT-only strategy and by up to 95% and 96% compared to a Boost-only technique. We also show that DBF is more suitable for applications with high variability in the CPU workload, while DBC is better when there is low workload variability.
As Graphics processing Units (GPUs) have evolved to deliver performance increases for general-purpose computations as well as graphics and multimedia applications, soft error reliability becomes an important concern. ...
详细信息
ISBN:
(纸本)9781665469586
As Graphics processing Units (GPUs) have evolved to deliver performance increases for general-purpose computations as well as graphics and multimedia applications, soft error reliability becomes an important concern. The soft error vulnerability of the applications is evaluated via fault injection experiments. Since performing fault injection takes impractical times to cover the fault locations in complex GPU hardware structures, prediction-based techniques have been proposed to evaluate the soft error vulnerability of General-Purpose GPU (GPGPU) programs based on the hardware performance characteristics. In this work, we propose ML-based prediction models for the soft error vulnerability evaluation of GPGPU programs. We consider both program characteristics and hardware performance metrics collected from either the simulation or the profiling tools. While we utilize regression models for the prediction of the masked fault rates, we build classification models to specify the vulnerability level of the programs based on their silent data corruption (SDC) and crash rates. Our prediction models achieve maximum prediction accuracy rates of 96.6%, 82.6%, and 87% for masked fault rates, SDCs, and crashes, respectively.
In this paper, we present a generic framework by which a computationally intensive job can be distributed into a number of tasks among various heterogeneous machines. The framework provides facilities for the interact...
详细信息
ISBN:
(纸本)1932415262
In this paper, we present a generic framework by which a computationally intensive job can be distributed into a number of tasks among various heterogeneous machines. The framework provides facilities for the interaction between these tasks;web-based user interaction for starting, suspending and restarting of jobs and continuous status update from the tasks of a job. We present the design of the framework along with a discussion of its implementation.
A real-time database system (RTDBS) must not only preserve the consistency of concurrent transactions but also meet their deadlines. This means that the RTDBS needs deadline conscious transaction processing algorithms...
详细信息
ISBN:
(纸本)1932415262
A real-time database system (RTDBS) must not only preserve the consistency of concurrent transactions but also meet their deadlines. This means that the RTDBS needs deadline conscious transaction processing algorithms. Moreover, the growth of real-time transaction processingapplications makes the high performance RTDBS develop. Although a lot of studies indicate that a shared disks (SD) cluster is suitable to develop the high performance transaction processing system, it cannot guarantee whether the SD cluster is applicable to a real-time transaction processing. In this paper, we investigate the cross effect of real-time transaction processing algorithms and SD cluster algorithms to develop a real-time database system in SD clusters (SD-RTDBS). We analyze the aggregate performance of each algorithm suggested in the real-time transaction processing and the SD cluster by various experiments using the SD-RTDBS simulation model.
In research of overlay networks, simulator takes a very important role. However, popular simulators, such as ns and PlanetLab can't meet the scale and performance requirement of overlay research. Although it is ne...
详细信息
ISBN:
(纸本)1932415262
In research of overlay networks, simulator takes a very important role. However, popular simulators, such as ns and PlanetLab can't meet the scale and performance requirement of overlay research. Although it is necessary for overlay researchers to observe activities of million nodes, current simulators can not give such simulation result. To improve the overlay network research efficiency, we design and implement ONSP, a novel parallel overlay network simulation platform, which provides parallel discrete event simulation of overlay networks on high performance cluster. With this tool, we are able to build overlay network simulator in large scale easily and test it in short time. The test result proves that ONSP can well address the requirement of performance and scalability.
In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application la...
详细信息
ISBN:
(纸本)9781665469586
In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processingapplications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallelapplications for different data stream frequency configurations.
Integrating parallel functions into the manipulation for persistent objects on a network-based shared memory architecture is a proposal currently under consideration. The cost associated with manipulating a large amou...
详细信息
ISBN:
(纸本)0818675799
Integrating parallel functions into the manipulation for persistent objects on a network-based shared memory architecture is a proposal currently under consideration. The cost associated with manipulating a large amount of distributed persistent objects is expected to improve from sequence to parallelprocessing. However, it is a complex task to combine persistence with the capability of parallel and distributedprocessing. This paper puts forth the design and implementation methods concerning this. Based on a C++-based language called INADA, in which functions for handling persistent objects are introduced, we present a language construct for accessing distributed persistent objects in parallel, and a new approach for supporting a transparent parallel and distributedprocessing. The transparency assures that distributed persistent objects are manipulated in parallel on multiple threads of remote computers as if they were manipulated in a local multiprocessor machine. A key point of this proposal is that we have made a combination of persistence, multithread primitives, network-based shared-memory, and agent-oriented paradigm.
暂无评论