Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount ...
详细信息
ISBN:
(纸本)9783642131356
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel programming is an important issue. the current parallelizing compilers are hard to parallelize the programs effectively. the programmer must think about how to allot the task to each processor in the beginning. In this paper, we present a software framework for designing parallel program. the proposed framework provides a convenient parallel programming environment for programmers to design the multi-core system's software. From the experiments, the proposed framework can parallelize the program effectively by applying the provided functions.
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data...
详细信息
ISBN:
(纸本)9783540729044
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data interoperation between ChinaGrid and SRB. the data interoperation between them is divided into two parts: data access from SRB to ChinaGrid and from ChinaGrid to SRB. Also this paper considers the issues about performance optimization. We get a satisfied experiment result through the optimization measures.
parallel computers provide an efficient and economical way to solve large-scale and/or time-constrained scientific, engineering, and industry problems. Consequently, there is a need to predict the performance order of...
详细信息
ISBN:
(纸本)9783540695004
parallel computers provide an efficient and economical way to solve large-scale and/or time-constrained scientific, engineering, and industry problems. Consequently, there is a need to predict the performance order of both deterministic and non-deterministic parallelalgorithms. the performance prediction of the traveling salesman problem (TSP) is a challenging problem because similar input data sets may cause significant variability in execution times. parallel performance of data-dependent algorithms depends on the problem size, the number of processors, and other parameters. Discovering the main other parameters is the real key to obtain a good estimation of performance order. this paper presents a novel methodology to the problem of predicting the performance of a parallel algorithm for solving the TSP. the entire process explores data in search of patterns and/or relationships detecting the main parameters that affect performance. then, it uses the measured values for this limited number of inputs to produce a multiple-linear-regression model. Finally, the regression equation allows for predicting how the algorithm will respond when given new input data sets. the preliminary experimental results are quite promising.
Data movement is a critical bottleneck for future generations of parallel systems. the class of .5D communication-avoiding algorithms were developed to address this bottleneck. these algorithms reduce communication an...
详细信息
ISBN:
(纸本)9781467395243
Data movement is a critical bottleneck for future generations of parallel systems. the class of .5D communication-avoiding algorithms were developed to address this bottleneck. these algorithms reduce communication and provide strong scaling in both time and energy. As a first step towards automating the development of communication-avoiding libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithmsthat are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. Withthe latter, wraparound communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 supercomputer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.
Object detection and tracking at real time is important and challenging tasks in many computer vision applications such as video robot navigation, surveillance, vehicle navigation, security applications, military appl...
详细信息
ISBN:
(纸本)9781728140421
Object detection and tracking at real time is important and challenging tasks in many computer vision applications such as video robot navigation, surveillance, vehicle navigation, security applications, military applications, patient monitoring system and traffic monitoring system. Object detection includes detecting the object in sequence of videos. In this paper we reviewed the different methods/algorithms for object detection and tracking at real time for high resolution video. Now day's high resolution imaging sensors/cameras is being used in different areas of applications such as security system, in military applications etc. For object detection and tracking in high resolution, greater frame rate requires more time to process a single frame, so it is an extreme challenges for researchers to detect and track target at real time. this sets a demand for fast computational algorithms for real time processing of high resolution videos. Moving object detection and tracking is one of the decisive active areas of research since last decade. In this paper we address and highlight a brief survey or review of various real time object detection and tracking algorithms for high resolution video available in the literature.
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube...
详细信息
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube MC(k, m) connects 2(2km + k) nodes with only k + in links per node. Metacube can be used to build parallel computing systems of very large scale with a small number of links per node. In this paper, we propose a new presentation of Metacube for algorithmic design. Based on the new presentation, we give efficient algorithms for parallel prefix computation and parallel sorting on Metacubes, respectively. the algorithm for prefix computation runs in 2(k)m (k + 1) + k communication steps and 2(k + 1)m + 2k computation steps on MC(k, m). the sort algorithm runs in O(2(k)m + k)(2) computation steps and O(2(k)m (2k + 1) + k)(2) communication steps on MC(k, m).
ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. ProcSimity's multicomputer simulator supports experimentation with selected allocation and s...
详细信息
ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. ProcSimity's multicomputer simulator supports experimentation with selected allocation and scheduling algorithms on architectures with a range of network topologies and for several current routing and flow control mechanisms. Message-passing can be simulated in detail at the flit level or at a higher level of modeling. Our tool supports both stochastic job streams as well as communication patterns from actual parallel applications, including several of the NAS parallel benchmarks. ProcSimity's visualization and performance analysis tool allows the user to view a dynamic animation of the selected algorithms as well as a variety of system and job level performance metrics. ProcSimity has been successfully used in experiments investigating the feasibility of non-contiguous processor allocation in meshes and k-ary n-cubes.
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, includ...
详细信息
ISBN:
(纸本)0769512607
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, including a network interface card-based processor and memory and efficient user-level communication libraries. We developed a micro-benchmark to test the scheduler's performance under various aspects of parallel job workloads: memory usage, bandwidth and latency-bound communication, number of processes, timeslice quantum, and multiprogramming levels. Our experiments show that the gang scheduler performs relatively well under most workload conditions, is largely insensitive to the number of concurrent jobs in the system and scales almost linearly with number of nodes. On the other hand, the scheduler is very sensitive to the timeslice quantum, and values under 30 seconds can incur large overheads and fairness problems.
the Compute Unified Device Architecture (CUDA) is a new parallelprocessing platform making use of the unified shader design of the most current Graphics processing Units (GPUs) from NVIDIA. In this paper, we apply th...
详细信息
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space explorat...
详细信息
ISBN:
(纸本)9781467387767
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space exploration. In the first part of design process, high-level simple analytical models for application mapping and evaluation are used and modified aiming at accelerating the evaluation of a single design configuration. Using the analytical model the design space is pruned and explored at high speed with low accuracy. In the second part of the design process, two Multi Objective Optimization algorithms based on Particle Swarm Optimization and Simulated Annealing have been proposed to perform design space exploration of the pruned design space with higher accuracy taking advantages of low-level architectural simulation engines. the results obtained by proposed algorithms will provide the designer more accurate solutions within an acceptable time. Considering the MJPEG application as the case study, each of these methods produces a set of near-optimal points. Simulation results show that the proposed methods can lead to near-optimal design configurations with acceptable accuracy in reasonable time.
暂无评论