the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computa...
ISBN:
(纸本)3540424954
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computational requirements with computing resources;a tool for binding to threads processors;a distributed object infrastructure for interaction and steering;optimal polling for latency-throughput tradeoffs in queue-based network interfaces for clusters;performance prediction of data-dependent task parallel programs;the hardware performance monitor toolkit;via communication performance on a gigabit Ethernet cluster;group-based performance analysis for multithreaded SMP cluster applications;exploiting unused time slots in list scheduling considering communication contention;an evaluation of partitioners for parallel SAMR applications;load balancing on networks with dynamically changing topology;approximation algorithms for scheduling independent malleable tasks;load redundancy elimination on executable code;using a swap instruction to coalesce loads and stores;data-parallel compiler support for multipartitioning;parallel and distributed databases, data mining and knowledge discovery;an experimental performance evaluation of join algorithms for parallel object databases;a classification of skew effects in parallel database systems;experiments in parallel clustering with DBSCAN;analysis of the cycle structure of permutations;scanning biosequence databases on a hybrid parallel architecture;experiences in using MPI-Io on top of GPFS for the IFS weather forecast code;improving conditional branch prediction on speculative multithreading architectures;performances of a dynamic threads scheduler and self-stabilizing neighborhood unique naming under unfair scheduler.
We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that...
详细信息
A network of workstations is a cost-effective alternative to parallel database systems. this paper studies the performance of hash join operations on a Pentium-based network of workstations platform. Specifically, we ...
详细信息
ISBN:
(纸本)1880843390
A network of workstations is a cost-effective alternative to parallel database systems. this paper studies the performance of hash join operations on a Pentium-based network of workstations platform. Specifically, we study performance of intraquery (pipelined) and intra-operation (partitioned) parallelism in a centralized query processing architecture. We also study the impact of the network bandwidth on the performance of the join algorithms. For this purpose, in our experimental study, we use a 10 Mbps Ethernet and a 100 Mbit fast Ethernet. Our performance results suggest that pipelined hash joins exhibit significant improvement in response time for complex queries and for queries that involve large input relations.
this paper evaluates the possibility of using a digital signal processor (DSP) in order to implement an image pattern recognition system based on a neural network architecture. the paper presents a brief introduction ...
详细信息
this paper evaluates the possibility of using a digital signal processor (DSP) in order to implement an image pattern recognition system based on a neural network architecture. the paper presents a brief introduction to neural network architectures and how such architectures can be used in pattern recognition; it presents the implementation of the neural network using a very powerful DSP microcomputer (eg, ADSP 2189 from Analog Devices), illustrates the main results for a character recognition system (execution time and error probability), and presents some conclusions.
this paper evaluates the possibility to use a Digital Signal Processor (DSP) in order to implement an image pattern recognition system based on neural network architecture. the paper is organized as follows: section I...
详细信息
ISBN:
(纸本)078037228X
this paper evaluates the possibility to use a Digital Signal Processor (DSP) in order to implement an image pattern recognition system based on neural network architecture. the paper is organized as follows: section I represent a brief introduction in neural networks architectures and how such architecture can be used in pattern recognition, section 11 present the implementation of the neural network using a very powerful DSP microcomputer (e.g. ADSP 2189 from Analog Devices), section III illustrated the main results for a letters recognition system (execution time and error probability) and section IV represents the conclusions.
the emerging WWW poses new technological challenges for information processing. the scale of WWW is expected to keep growing as more devices, such as mobile phones and PDAs are equipped withthe ability to access inte...
详细信息
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processin...
详细信息
ISBN:
(纸本)0780370007
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processing system (ARPS2) is proposed with DSP chips as its kernel processing nodes. the DSP chips are used withparallel architecture. Each node has its private input and output memory. It adopts several parallel techniques, such as parallel storage, parallelprocessing, parallel code loading and parallel data organization to achieve high efficiency. It has a simple structure, excellent flexibility and easiness in developing. ARPS2 is going to be applied to an airborne radar. It can also be applied to perform high-speed real time signal processingalgorithms in other kinds of radar.
this article describes a 30 frames/second VGA format image sensor made in a standard CMOS process with an embedded massively parallel processor. the processor is fully programmable and therefore the sensor IC itself i...
详细信息
this article describes a 30 frames/second VGA format image sensor made in a standard CMOS process with an embedded massively parallel processor. the processor is fully programmable and therefore the sensor IC itself is able to run a variety of algorithms with data and processing in close vicinity of the sensor. Because of the parallel architecture comprising processor array and parallel memory accesses, high computational performances of up to 5 GOPS at 16 MHz are achieved. this high performance allowed us to implement skin tone detection on the camera itself as part of a larger system for face recognition, releasing the host computer of cumbersome pixel processing tasks and minimizing the data transfer between camera and computer.
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is soft...
详细信息
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is software prefetching, which initiates data fetches ahead of expected demand-miss references by a certain distance. Since both techniques seem to target the same types of latencies and use the same system resources, it is unclear which technique is superior or if both can be combined. this paper shows that these two techniques are actually mutually beneficial, each helping to overcome limitations of the other: We perform our study for uniprocessor and multiprocessor configurations, in simulation and on a real machine (the Convex Exemplar). Compared to prefetching alone (the state-of-the-art implemented in systems today), the combination of the two techniques reduces the execution time by an average of 21% across all cases studied in simulation, and by an average of 16% for 5 out of 10 cases on the Exemplar. the combination sees execution time reductions relative to clustering alone averaging 15% for 6 out of 11 cases in simulation and 20% for 6 out of 10 cases on the Exemplar.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
暂无评论