We show that developing an optimal parallelization of the two-list algorithm is much easier than we once thought. All it takes is to observe that the steps of the search phase of the two-list algorithm are closely rel...
详细信息
Huge workload and time-consuming of the phase computation based on the Wavelet Transform Profilometry (WTP) so that not meet real-time three-dimensional (3D) measurement needs. Fortunately the pixels which in situ nee...
详细信息
In this paper, we propose a parallel algorithm to solve a class of nonlinear network optimization problems. the proposed parallel algorithm is a combination of the successive quadratic programming and the dual method,...
详细信息
the proceedings contain 8 papers. the topics discussed include: blink: not your father's database!;MemcacheSQL - a scale-out SQL cache engine;a cost-aware strategy for merging differential stores in column-oriente...
ISBN:
(纸本)9783642334993
the proceedings contain 8 papers. the topics discussed include: blink: not your father's database!;MemcacheSQL - a scale-out SQL cache engine;a cost-aware strategy for merging differential stores in column-oriented in-memory DBMS;Microsoft SQL server parallel data warehouse: architecture overview;relax and let the database do the partitioning online;adaptive processing of multi-criteria decision support queries;scalable social graph analytics using the vertica analytic platform;and a near real-time personalization for ecommerce platform.
the objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note # 247, 2011], the...
详细信息
ISBN:
(数字)9783642314643
ISBN:
(纸本)9783642314636;9783642314643
the objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note # 247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processingthe panel in a domino- like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. the results reported in this paper are very encouraging. the new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system.
this paper describes the design of unified support vector machine circuit for pedestrians and cars detection. By unifying the algorithms and architectures of linear and nonlinear SVM classifications, the proposed circ...
详细信息
ISBN:
(纸本)9781467308595
this paper describes the design of unified support vector machine circuit for pedestrians and cars detection. By unifying the algorithms and architectures of linear and nonlinear SVM classifications, the proposed circuit can support both linear and non-linear classifications very efficiently in terms of circuit size and performance. the circuit size is minimized by sharing most of the resources required in the computation for both classification types. parallel architecture with pipeline is adopted to accelerate the processing speed to handle a large amount of operations for real-time processing. 48x96 and 64x64 sliding windows with 6 window strides are used to detect pedestrians and cars, respectively. the synthesized circuit using 65nm standard cell library consists of 848,349 gates and its maximum operating frequency is 435MHz. the circuit can process 91.9 640x480 image frames per second assuming three cameras equipped on front, right and left side positions of the vehicle.
To make parallel programming as widespread as parallelarchitectures, more structured parallel programming paradigms are necessary. One of the possible approaches are algorithmic skeletons. they can be seen as higher ...
详细信息
In this paper we investigate the energy efficiency of processors based on ARM Cortex-A9 cores for scientific numerical applications. We study the performance for a few numerical kernels which appear in a larger set of...
详细信息
ISBN:
(纸本)9780769548654;9781467351461
In this paper we investigate the energy efficiency of processors based on ARM Cortex-A9 cores for scientific numerical applications. We study the performance for a few numerical kernels which appear in a larger set of scientific applications. From power measurements that were performed on different platforms we estimate the energy consumed when executing these kernels.
Green technology is a new research area in electronics, which meets the needs of society and explores the ability of VLSI circuits and embedded systems to positively impact the environment. In VLSI physical design aut...
详细信息
ISBN:
(纸本)9781467326209;9781467326193
Green technology is a new research area in electronics, which meets the needs of society and explores the ability of VLSI circuits and embedded systems to positively impact the environment. In VLSI physical design automation, channel routing is a fundamental problem but reducing the total wire length for interconnecting the nets of different circuit blocks is one of the most challenging requirements to enhance the performance of a chip to be designed. Reducing the total wire length for interconnection not only minimizes the cost of the physical wire segments required, but also reduces the amount of occupied area for interconnection, signal propagation delays, electrical hazards, power consumption, heat generation, and over all the parasitics present in a circuit. thus it has a direct impact on daily life and environment. Channel routing problem for wire length minimization is an NP-hard problem. Hence as a part of developing an alternative, we modify the existing graph theoretic framework Track_Assignment_Heuristic (TAH) to reduce the total (vertical) wire length. In this paper we propose an efficient polynomial time graph based parallel algorithm to reduce the total wire length without radically increasing of required area for interconnection in the reserved two-layer no-dogleg Manhattan channel routing model. the performance and efficiency of our algorithm is highly encouraging for different well-known benchmarks channels.
In this paper, we have proposed high data throughput AES hardware architecture by partitioning ten rounds into sub-blocks of repeated AES modules. the blocks are separated by intermediate buffers providing a complete ...
详细信息
ISBN:
(纸本)9780769548845;9781479902767
In this paper, we have proposed high data throughput AES hardware architecture by partitioning ten rounds into sub-blocks of repeated AES modules. the blocks are separated by intermediate buffers providing a complete ten stages of AES pipeline structure. In addition, the AES is internally evenly divided to ten pipeline stages;withthe additional feature that the shift rows block (Shift Row) is structured to operate before the byte substitute (Byte Substitute) block. this proposed swapping operation has no effect on the AES encryption algorithm;however, it streamlines the processing of four blocks of data in parallel rather than 16 blocks, which is considered as the key advantage for area saving. We have evaluated the performance of our implementation in terms of throughput rate and hardware area for Xilinx's SPARTAN-3 FPGA. the simulation results show that the proposed AES has higher throughput rate of about 4.25% than the general AES pipeline structure with a saving hardware area of 56%.
暂无评论