Thread-Level speculation (TLS) is an automatic parallel technique, which partitions sequential programs with data dependencies into multithreads to be executed in parallel on the multi-core platform, and it is used to...
详细信息
ISBN:
(纸本)9781728143286
Thread-Level speculation (TLS) is an automatic parallel technique, which partitions sequential programs with data dependencies into multithreads to be executed in parallel on the multi-core platform, and it is used to aggressively exploit the parallelism of sequential programs at runtime. However, the existing implementations of the software TLS models always incur extra and high overhead, even could be revoking performance benefits from TLS when frequent data dependencies are violated. Based on the POSIX thread library, which has a high portability, this paper proposes an improved TLS programming model to exploit potential parallelism of the sequential programs. Firstly, the execution model of the proposed programming model adopts the optimized main-assistance-work thread mechanism to improve overall operational efficiency. And then a new data structure is proposed and designed to implement a thread-level speculation mechanism for solving the thread version copy. Using the benchmarks, experimental results show that the proposed TLS programming model can acquire a significant increase in speedup on a multi-core platform and the proposed TLS programming model has excellent performance benefits. Compared with the traditional TLS programming model, the improved TLS model can achieve a speedup of 2.87 to 12.22 which delivers a 9.70% performance improvement on average.
The pattern of agricultural production and land use specified by the linear programming model was greatly different from the actual pattern existing in 1963. I believe that the application of this model or similar mod...
详细信息
Cloud applications typically consist of multiple components interacting with each other. Service-orientation, standards such as WSDL, and the workflow technology provide common means to enable the interaction between ...
详细信息
ISBN:
(纸本)9789897582486
Cloud applications typically consist of multiple components interacting with each other. Service-orientation, standards such as WSDL, and the workflow technology provide common means to enable the interaction between these components. Nevertheless, during the automated application deployment, endpoints of interacting components, e.g., URLs of deployed services, still need to be exchanged: the components must be wired. However, this exchange mainly depends on the used (i) middleware technologies, (ii) programming languages, and (iii) deployment technologies, which limits the application's portability and increases the complexity of implementing components. In this paper. we present a programming model for easing the implementation of interacting components of automatically deployed applications. The presented programming model is based on the TOSCA standard and enables invoking components by their identifiers and interface descriptions contained in the application's TOSCA model. The approach can be applied to Cloud and IoT applications, i.e., also software hosted on physical devices may use the approach to call other application components. To validate the practical feasibility of the approach. we present a system architecture and prototype based on OpenTOSCA.
Multithreading and multi-core processing have been shown to be powerful approaches for boosting a system performance by taking advantage of parallelism in applications. This paper presents a processor design by unifyi...
详细信息
ISBN:
(纸本)9781595937711
Multithreading and multi-core processing have been shown to be powerful approaches for boosting a system performance by taking advantage of parallelism in applications. This paper presents a processor design by unifying RISC and multithreading DSP for the sophisticated multimedia applications with advanced standards such as H.264. The proposed design not only minimizes integration costs for embedded multithreading/multi-core design by independent coherent threads, but also reduces the memory bandwidth requirements by one-stop streaming buffer and a very fast data exchange mechanism. With the proposed techniques and appropriate programming model, we can achieve 78% reduction of memory bandwidth and 89% reduction of processing time in H.264 video encoding, compared to traditional single stream micro-processor.
Several emerging trends are pointing to increasing heterogeneity among nodes and/or cores in HPC systems. Existing programming models, especially for distributed memory execution, typically have been designed to facil...
详细信息
ISBN:
(纸本)9781479955008
Several emerging trends are pointing to increasing heterogeneity among nodes and/or cores in HPC systems. Existing programming models, especially for distributed memory execution, typically have been designed to facilitate high performance on homogeneous systems. This paper describes a programming model and an associated runtime system we have developed to address the above need. The main concepts in the programming model are that of a domain and interactions between the domain elements. We explain how stencil computations, unstructured grid computations, and molecular dynamics applications can be expressed using these simple concepts. We show how inter-process communication can be handled efficiently at runtime just from the knowledge of domain interaction, for different types of applications. Subsequently, we develop techniques for the runtime system to automatically partition and re-partition the work among heterogeneous processors or nodes.
This paper proposes a flexible programming model (FPM), which addresses the automatic parallel execution for functional tasks on heterogeneous multiprocessors. Guided by the simply annotated source codes, a front-end ...
详细信息
ISBN:
(纸本)9780769546766
This paper proposes a flexible programming model (FPM), which addresses the automatic parallel execution for functional tasks on heterogeneous multiprocessors. Guided by the simply annotated source codes, a front-end source to source compiler is provided to identify the parallel regions and generate the sources codes. A runtime middleware analyzes the inter-task data dependencies and schedules the tasks with renaming techniques automatically. FPM has been verified by the prototype built on state-of-art FPGA. Examples demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.
MapReduce brought on the Big Data revolution. However, its impact on scientific data analyses has been limited because of fundamental limitations in its data and programming models. Scientific data is typically stored...
详细信息
ISBN:
(纸本)9783030206567;9783030206550
MapReduce brought on the Big Data revolution. However, its impact on scientific data analyses has been limited because of fundamental limitations in its data and programming models. Scientific data is typically stored as multidimensional arrays, while MapReduce is based on key-value (KV) pairs. Applying MapReduce to analyze array-based scientific data requires a conversion of arrays to KV pairs. This conversion incurs a large storage overhead and loses structural information embedded in the array. For example, analysis operations, such as convolution, are defined on the neighbors of an array element. Accessing these neighbors is straightforward using array indexes, but requires complex and expensive operations like self-join in the KV data model. In this work, we introduce a novel `structural locality'-aware programming model (SLOPE) to compose data analysis directly on multidimensional arrays. We also develop a parallel execution engine for SLOPE to transparently partition the data, to cache intermediate results, to support in-place modification, and to recover from failures. Our evaluations with real applications show that SLOPE is over ninety thousand times faster than Apache Spark and is 38% faster than TensorFlow.
暂无评论