An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to...
详细信息
ISBN:
(纸本)0769515126
An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to get better parallel computation performance, this paper gives a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations withthe shortest parallel executing time of tasks can be selected.
Micro-Electro-Mechanical System (MEMS) is the integration of mechanical elements, sensors, actuators, and electronics on a common silicon substrate through micro fabrication technology. With MEMS technologies, micron-...
详细信息
ISBN:
(纸本)9783642131356
Micro-Electro-Mechanical System (MEMS) is the integration of mechanical elements, sensors, actuators, and electronics on a common silicon substrate through micro fabrication technology. With MEMS technologies, micron-scale sensors and other smart products can be manufactured. Because of its micron-scale. MEMS products' structure is nearly invisible, even the designer is hard to know whether the device is well-designed and well-produced. So a visual 3D MEMS simulation implement, named ZProcess[1], was proposed in our previous work to help designers realizing and improving their designs. ZProcess shows the MEMS device's 3D model using voxel method. Its accurate, but its speed is unacceptable when the scale of voxel-data is large. In this paper, an improved parallel MEMS simulation implementation is presented to accelerate ZProcess by using GPU (Graphic processing Unit). the experimental results show the parallel implement gets maximum 160 times speed up comparing withthe sequential program.
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle...
详细信息
ISBN:
(纸本)9781479904945;9781479904938
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle withthe efficient processing of such networks. Employing multi-core graphic processing units (GPUs) could provide sufficient processing power for the analysis of such networks. However, commonly designed algorithms cannot exploit these massively parallelprocessing power for the analysis of such networks. In this paper, we present the Multi Layer Network Decomposition (MLND) approach which provides a general approach for parallel network analysis using multi-core processors via efficient partitioning and mapping of networks onto GPU architectures. Evaluation using a 336 core GPU graphic card demonstrated a 16x speed-up in complex network analysis relative to a CPU based approach.
this work presents an improved genetic algorithm (IGA) for minimizing periodic preventive maintenance costs in series-parallel systems. the intrinsic properties of a repairable system, including the structure of relia...
详细信息
ISBN:
(纸本)9783642131189
this work presents an improved genetic algorithm (IGA) for minimizing periodic preventive maintenance costs in series-parallel systems. the intrinsic properties of a repairable system, including the structure of reliability block diagrams and component maintenance priorities are considered by the proposed IGA. the proposed component importance measure considers these properties, identifies key components, and determines their maintenance priorities. the optimal maintenance periods of these important components are then determined to minimize total maintenance cost given the allowable worst reliability of a repairable system. An adjustment mechanism is established to solve the problem of chromosomes falling into infeasible areas. A response surface methodology is further used to systematically determine crossover probability and mutation probability in the GA instead of using the conventional trial-and-error process. A case study demonstrates the effectiveness and practicality of the proposed IGA for optimizing the periodic preventive maintenance model in series-parallel systems.
Our research goal is to retarget image processing programs written in sequential languages (e.g., C) to architectures with data-parallelprocessing capabilities. Image processingalgorithms are often inherently data-p...
详细信息
ISBN:
(纸本)0769520278
Our research goal is to retarget image processing programs written in sequential languages (e.g., C) to architectures with data-parallelprocessing capabilities. Image processingalgorithms are often inherently data-parallel, but the artifacts imposed by the sequential programming language (e.g., loops, pointer variables, linear address spaces) can obscure the parallelism and prohibit generation of efficient parallel code. this paper proposes a program representation and pattern-recognition approach for generating a data-parallel program specification from sequential source code. the representation is based on an extension of the multidimensional synchronous dataflow (MDSDF) model of computation. Central to extracting this representation front code is understanding the mapping between iterations and array variables in the source code and the operations over array regions (e.g., rows, columns, tiled blocks) that they implement. Examples are presented to illustrate this mapping, and a set of patterns for recognizing these regions are proposed. the correctness of the retargeted MDSDF specifications are validated and the potential speedup from parallel execution shown.
Network packet processing applications increasingly execute at speeds of 1-40 Gigabits per second, often running on multi-core chips that contain multithreaded network processing units (NPUs) and a general-purpose pro...
详细信息
ISBN:
(纸本)9783642131356
Network packet processing applications increasingly execute at speeds of 1-40 Gigabits per second, often running on multi-core chips that contain multithreaded network processing units (NPUs) and a general-purpose processor core. Such applications are typically programmed in a language that exposes NPU specifics needed to optimize low-level thread control and resource management. this facilitates optimization at the cost of increased software complexity and reduced portability. In contrast, our approach provides portability by combining coarse-grained, SPMD parallelism with programming in the packetC language's high-level constructs. this paper focuses on searching packet contents for packet protocol headers. We require the host system to locate protocol headers for layers 2, 3 and 4, and to encode their offsets data in a packet information block (PIB). packetC provides descriptors, C-style structures superimposed on the packet array at runtiine-calculable, user or PIB-supplied offsets. We deliver state-of-the-practice performance via an FPGA for locating layer offsets and via micro-coded interpretation that treats PIB layer offsets as a special addressing mode.
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the ...
详细信息
ISBN:
(纸本)0769515126
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the previous serial program, the computation of curvature, the first-order and the second-order difference were determined to be the main objects of parallelization. Some processing techniques were applied to convert the serial program into parallel program, such as the strategy of "Divide and Conquer", the balance of the loading distribution. Numerical simulation computation of the parallel program results in a great increase of computing speed of the non-ideal 3-D space detonation wave propagation.
the use of reconfigurable computer vision architecture for image processing tasks is an important and challenging application in real time systems with limited resources. It is an emerging field as new computing archi...
详细信息
ISBN:
(纸本)9781450347860
the use of reconfigurable computer vision architecture for image processing tasks is an important and challenging application in real time systems with limited resources. It is an emerging field as new computing architectures are developed, new algorithms are proposed and users define new emerging applications in surveillance. In this paper, a computer vision architecture capable of reconfiguring the processing chain of computer vision algorithms is summarised. the processing chain consists of multiple computer vision tasks, which can be distributed over various computing units. One key characteristic of the designed architecture is graceful degradation, which prevents the system from failure. this system characteristic is achieved by distributing computer vision tasks to other nodes and parametrizing each task depending on the specified quality-of-service. Experiments using an object detector applied to a public dataset are presented.
Work-efficient task-parallelalgorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bo...
详细信息
ISBN:
(纸本)9781728136134
Work-efficient task-parallelalgorithms enforce ordering between tasks using queuing primitives. Such algorithms offer limited parallelism due to queuing constraints that result in data movement and synchronization bottlenecks. Speculatively relaxing order of tasks across cores using the Galois framework shows promise as false dependencies generated by strict queuing constraints are mitigated to unlock task parallelism. However, relaxed ordering results in redundant work, for which Galois relies on static measures to improve work-efficiency. this paper proposes a dynamic multi-level parent-child task dependency checking mechanism in Galois to prune redundant work by exploiting monotonic properties of shared data values. Evaluation on a 40-core Intel Xeon multicore shows an average of 2x performance improvements over state-of-the-art ordered and relax ordered graph algorithms.
暂无评论