Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while c...
详细信息
ISBN:
(纸本)0769516262
Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while conventional approaches tend to move data to the threads that need them. Migration approaches enhance spatial locality by making large address spaces local, but are less adept at exploiting temporal locality. Data-moving approaches, such as cached remote memory fetches or distributed shared memory, can use both types of locality. We present experimental evaluation of thread migration's ability to reduce the impact of remote array accesses across distributed-memory computers. Nomadic Threads uses compiler-generated fine-grain threads which either migrate to make data local or fetch cache lines, tolerating latency with multithreading. We compare these alternatives using various array access patterns.
As thermal constraints reduce the pace of CPU performance improvements, the cost and scalability of future HPC architectures will be increasingly dominated by the interconnect. In this work we perform an in-depth stud...
详细信息
ISBN:
(纸本)0780394615
As thermal constraints reduce the pace of CPU performance improvements, the cost and scalability of future HPC architectures will be increasingly dominated by the interconnect. In this work we perform an in-depth study of the communication requirements across a broad spectrum of important scientific applications, whose computational methods include: finite-difference, lattice-bolzmann, particle in cell, sparse linear algebra, particle mesh ewald, and FFT-based solvers. We use the IPM (integrated Performance Monitoring) profiling framework to collect detailed statistics on communication topology and message volume with minimal impact to code performance. By characterizing the parallelism and communication requirements of such a diverse set of applications, we hope to guide architectural choices for the design and implementation of interconnects for future HPC systems.
The rapid growth in volume, velocity, and variety of data produced by applications of scientific computing, commercial workloads and cloud has led to Big Data. Traditional solutions of data storage, management and pro...
详细信息
ISBN:
(纸本)9781509039364
The rapid growth in volume, velocity, and variety of data produced by applications of scientific computing, commercial workloads and cloud has led to Big Data. Traditional solutions of data storage, management and processing cannot meet demands of this distributed data, so new execution models, data models and software systems have been developed to address the challenges of storing data in heterogeneous form, e.g. HDFS, NoSQL database, and for processing data in parallel and distributed fashion, e.g. MapReduce, Hadoop and Spark frameworks. This work comparatively studies Apache Spark distributed data processing framework. Our study first discusses the resource management subsystems of Spark, and then reviews several of the distributed data storage options available to Spark.
Testing of image processingapplications is a challenging job especially, when evaluating the correctness of output image. Generally, output images are evaluated manually by visual inspection carried out by an expert ...
详细信息
ISBN:
(纸本)9781479986767
Testing of image processingapplications is a challenging job especially, when evaluating the correctness of output image. Generally, output images are evaluated manually by visual inspection carried out by an expert tester, which is the main hindrance in automation of testing process. Recently, statistical and metamorphic testing approaches are presented to automate output evaluation of image processingapplications. The statistical method is dependent on availability of statistical distribution of output images, whereas metamorphic testing require more research efforts to make it widely used in practice. Metamorphic testing is a well-known technique to alleviate the test oracle problem and eliminates the required manual efforts by using relations of input and output images. Follow-up test cases are generated based on these relations and their expected output is evaluated. This paper addresses test oracle problem for image processingapplications and demonstrates how properties of implementation under test can be adopted as metamorphic relations. We have studied general and specific metamorphic relations of morphological image operations such as dilation and erosion. Selection of metamorphic relations and their effectiveness by mutation analysis is demonstrated. The results show that metamorphic testing is useful for evaluation of output images in the absence of a perfect test oracle.
RISC-based Massively parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication ...
详细信息
ISBN:
(纸本)0818677937
RISC-based Massively parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi's SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor, the ISO MHz HARP-IE, solves the cache miss penalty by ''pseudo vector processing'' (PVP). In PVP, data is loaded by prefetching to a special register bank bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. Third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a ''remote DMA transfer'' protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance.
In this paper we propose a systematic approach to performance analysis of workflow applications on the Grid. We introduce an ideal model for the workflow execution time and explain the difference to the real measured ...
详细信息
ISBN:
(纸本)9781424403431
In this paper we propose a systematic approach to performance analysis of workflow applications on the Grid. We introduce an ideal model for the workflow execution time and explain the difference to the real measured times based on a hierarchy of performance overheads for Grid computing. We describe how to systematically measure and compute the overheads from individual activities to entire workflow applications. We adjusted well-known parallelprocessing metrics to the scope of Grid computing, comprising speedup and efficiency. We have implemented and largely automatised our analysis approach in the context of the ASKALON Grid application development and computing environment. We present experimental results that show detailed overhead analysis of two real-world workflow applications executed in a national Grid environment.
Simulated Evolution (SimE) is an evolutionary metaheuristic that has produced results comparable to well established stochastic heuristics such as SA, TS and GA, with shorter runtimes. However, for problems with a ver...
详细信息
We have leveraged STARE indexing to package partitioned data chunks from diverse datasets into netCDF files, distributed them on a cluster of 16 lightweight nodes with their placements spatiotemporally co-aligned, and...
详细信息
ISBN:
(纸本)9781538691540
We have leveraged STARE indexing to package partitioned data chunks from diverse datasets into netCDF files, distributed them on a cluster of 16 lightweight nodes with their placements spatiotemporally co-aligned, and demonstrated a few integrative analyses using netCDF parallel I/O and Python MPI, with single-user performance and scalability comparable to, or even better than, that of a parallel array database management system (ADBMS) such as SciDB. However, records of the node location and STARE index ranges for each data chunk, similar to the chunk maps of SciDB, must be maintained and consulted by the I/O and analysis code for coordinating the analytic operations in parallel, in order to achieve the good performance and scalability.
The processing time of real-time sensing, data analytics and other applications is an important performance metric. When the application is being executed on a distributed system, the load balancing scheme among proce...
详细信息
ISBN:
(纸本)9781538627235
The processing time of real-time sensing, data analytics and other applications is an important performance metric. When the application is being executed on a distributed system, the load balancing scheme among processing nodes significantly affects the total processing time of the application. We consider a load balancing scheme for distributed computing at the edge of the network. In the edge model considered, a group of nodes, either mobile or static, with processing and sensing capabilities, are connected to each other over a wireless ad-hoc network. Load balancing among edge nodes is formulated as a min-max optimization problem, with the objective of minimizing the overall processing time of the application while still satisfying the wireless channel capacity and link contention constraints. We use an aggregate utility method to convert the min-max problem into a convex optimization problem. The obtained constrained convex optimization problem is then relaxed with Lagrangian dual decomposition and solved with gradient descent. This form of the formulation can be implemented in a fully distributed manner among the edge nodes, which is consistent with the decentralized nature of edge networks. However, the convergence of this scheme may be slow. We further propose a heuristic algorithm which achieves fast convergence. Our simulation results show that it can give near-optimal performance most of the time.
The rigorous characterization of the behaviour of a radiobase antenna for wireless communication systems is a hot topic both for antenna or communication system design and for radioprotection-hazard reasons. Such a ch...
详细信息
暂无评论