parallel computers provide an efficient and economical way to solve large-scale and/or time-constrained scientific, engineering, and industry problems. Consequently, there is a need to predict the performance order of...
详细信息
ISBN:
(纸本)9783540695004
parallel computers provide an efficient and economical way to solve large-scale and/or time-constrained scientific, engineering, and industry problems. Consequently, there is a need to predict the performance order of both deterministic and non-deterministic parallelalgorithms. the performance prediction of the traveling salesman problem (TSP) is a challenging problem because similar input data sets may cause significant variability in execution times. parallel performance of data-dependent algorithms depends on the problem size, the number of processors, and other parameters. Discovering the main other parameters is the real key to obtain a good estimation of performance order. this paper presents a novel methodology to the problem of predicting the performance of a parallel algorithm for solving the TSP. the entire process explores data in search of patterns and/or relationships detecting the main parameters that affect performance. then, it uses the measured values for this limited number of inputs to produce a multiple-linear-regression model. Finally, the regression equation allows for predicting how the algorithm will respond when given new input data sets. the preliminary experimental results are quite promising.
In this paper we present a novel and complete approach on how to encapsulate parallelism for relational database query execution that strives for maximum resource utilization for both CPU and disk activities. Its simp...
详细信息
ISBN:
(纸本)9783540695004
In this paper we present a novel and complete approach on how to encapsulate parallelism for relational database query execution that strives for maximum resource utilization for both CPU and disk activities. Its simple and robust design is capable of modeling intra- and inter-operator parallelism for one or more parallel queries in a most natural way. In addition, encapsulation guarantees that the bulk of relational operators can remain unmodified, as long as their implementation is thread-safe. We will show, that withthis approach, the problem of scheduling parallel tasks is generalized, so that it can be safely entrusted to the underlying operating system (OS) without suffering any performance penalties. On the contrary, relocation of all scheduling decisions from the DBMS to the OS guarantees a centralized and therefore near-optimal resource allocation (depending on the OS's abilities) for the complete system that is hosting the database server as one of its tasks. Moreover, withthis proposal, query parallelization is fully transparent on the SQL interface of the database system. Configuration of the system for effective parallel query execution can be adjusted by the DB administrator by setting two descriptive tuning parameters. A prototype implementation has been integrated into the Transbase (R) relational DBMS engine.
Since Schnorr - Euchner Sphere Decoding (SE-SD) does not guarantee a fixed throughput, the searching cycles of SE-SD should be limited for the practical implementation. Given SE-SD with runtime constraint causes degra...
详细信息
ISBN:
(纸本)9781424414567
Since Schnorr - Euchner Sphere Decoding (SE-SD) does not guarantee a fixed throughput, the searching cycles of SE-SD should be limited for the practical implementation. Given SE-SD with runtime constraint causes degradation in performance due to the variance of searching cycles, an enhanced SE-SD architecture with a small variance of searching cycles is proposed in this paper for a multi-input multi-output(MIMO) system. Small variance in number of searching cycle is achieved by applying parallel partial Euclidean distance (PED) calculation units to the one-node-per-cycle architecture. Since the proposed architecture is able to evaluate more children nodes in a single cycle, average processing cycles and error performance are significantly improved with a per-block run-time constraint. Our proposed parallel architecture increases the complexity about two times, but it can obtain a 2 dB gain in a 4x4 16QAM system when the runtime constraint is 7 cycles.
In the block ciphers, though the operation is quite complex, there are a lot of similar characteristics including arithmetic unit, operation width, parallel data and ordinal implement. It is very suitable for designin...
详细信息
ISBN:
(纸本)9780769532875
In the block ciphers, though the operation is quite complex, there are a lot of similar characteristics including arithmetic unit, operation width, parallel data and ordinal implement. It is very suitable for designing ASIP (Application Specific Instruction Set Processor) targeted at block ciphers. In this thesis, a reconfigurable processor architecture is proposed, At the mean time, in order to improve instruction level parallelism. this thesis put forward the instruction bundle structure based on VLIW architecture, which supports word and sub-word parallelprocessing. As to the design of cipher arithmetic units, we adopt a specific design which is reconfigurable, so as to make the architecture have instruction level reconfigurable function. Besides, In order to solve the bottleneck of storage and access, this thesis adopt clustered technology to design two separated register files to storage data and subkey. Furthermore, this scheme reduces energy and clock cycles. A number of algorithms were implemented successfully on the processor. the prototype is realized using Altera's FPGA. Synthesis, placement and routing of processor have accomplished under 0.18 mu m CMOS technology through Design Complier tool. Compared with other ASIP targeted at block cipher, the results prove that processor can achieve relatively high performance in block cipher algorithmsprocessing.
Ubiquitous and pervasive computing systems are characterized by intelligent sensing and computing. these systems seamlessly understand and respond to the environment with little human intervention. Since such systems ...
详细信息
ISBN:
(纸本)9780769534923
Ubiquitous and pervasive computing systems are characterized by intelligent sensing and computing. these systems seamlessly understand and respond to the environment with little human intervention. Since such systems are required to be small and inobtrusive, embedded systems play an important role in their design. Furthermore, these systems need to run sophisticated applications in a resource constrained environment. In this paper we focus on computer vision applications in such systems. As these applications require larger memory and are computationally intensive, optimization of these algorithms is imperative. this paper discusses some optimization techniques and their impact on execution time in a complex real-world face tracking example. In certain scenarios, the requirement may be to suggest a hardware architecture for achieving a specific response time. this is espescially important for mission critical applications in the fields of automotive, medical or defence. However, the estimation of hardware architecture parameters such as core-clock frequency, memory requirement, optimal number of parallel execution paths for a given application is not straight forward. In this paper, we also present a structured approach to determine the hardware architecture for a driver assistance and safety application with stringent performance constraints.
To solve 'dimensional curse' problem, the cell-based filtering scheme has been proposed, but it shows a linear decrease in performance as the dimensionality is increased. In this paper, we propose a parallel h...
详细信息
ISBN:
(纸本)9781424423576
To solve 'dimensional curse' problem, the cell-based filtering scheme has been proposed, but it shows a linear decrease in performance as the dimensionality is increased. In this paper, we propose a parallel high-dimensional index structure for content-based information retrieval so as to cope withthe linear decrease in retrieval performance. In addition, we devise data insertion, range query and k-NN query processingalgorithms which are suitable for a cluster-based parallel architecture. Finally, we show that our parallel index structure achieves good retrieval performance in proportion to the number of servers in the cluster-based architecture and it outperforms a parallel version of the VA-File when the dimensionality is over 10.
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is...
详细信息
ISBN:
(纸本)9780769533520
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is difficult to meet withthe increasing high performance requirements of diversified applications at different levels for general-purpose computing. A promising feasible solution is the novice multi-core systems which extend the parallelism to CPU level by integrating multiple processing units on a single die. this paper uses Finite-Difference Time-Domain (FDTD) algorithm as a case study, designing suitable parallel FDTD algorithms for three architectures: distributed-memory machines with single-core processors, shared-memory machines with dual-core processors, and the Cell Broadband Engine (Cell/B.E.) processor with nine heterogeneous cores. the experiment results show that the Cell/B.E. processor using 8 SPEs achieves a significant speedups of 7.05 faster than AMD single-core Opteron processor and 3.37 than AMD dual-core Opeteron processor at the Processor level.
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteris...
详细信息
ISBN:
(纸本)9780769534435
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteristics, supports modelers in two respects: one is the built-in support for co-processing of multiple models and the other is the generically delivered high performance achieved by implementing concurrency features of multicore and distributed memory architectures. Technically, GraPA is designed as a C++ template framework, where the modeler's data structures and algorithms instantiate the framework. Using this approach, we handle parallelprocessing of lock-free data structures and message passing transperently to the modelers. In this paper, we report on the status of the implementation of GraPA and on its performance characteristics.
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significa...
详细信息
ISBN:
(纸本)9783540681052
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significant time on traditional computer architectures. In this paper, we present a scalable and efficient mapping of DNA sequence alignment onto the Cell BE multi-core architecture. Our mapping uses two types of parallelization techniques: (i) SIMD vectorization within a processor and (ii) wavefront parallelization between processors.
Artificial Neural Networks are highly parallel structures inspired by the human brain. they have been used successfully in many human-like applications, such as pattern recognition. Performance of these Networks can b...
详细信息
ISBN:
(纸本)9781424422050
Artificial Neural Networks are highly parallel structures inspired by the human brain. they have been used successfully in many human-like applications, such as pattern recognition. Performance of these Networks can be enhanced if used properly in conjunction with equally powerful mathematical tools. In this paper, we used the discrete wavelet transform as a pre-processing tool for two well-known neural classifiers;Competitive Layer Networks and Learning Vector Networks. the wavelets transform was used successfully to approximate the input patterns of the two classifiers and thus reduced their input-layer requirements considerably. Such reduction facilitates cost-effective hardware implementations of Artificial Neural Networks.
暂无评论