In this paper, we describe a processor architecture tailored for radix-4 and mixed-radix FFT algorithms, which have lower arithmetic complexity than radix-2 algorithms. the processor is based on transport triggered ar...
详细信息
ISBN:
(纸本)9781424423538
In this paper, we describe a processor architecture tailored for radix-4 and mixed-radix FFT algorithms, which have lower arithmetic complexity than radix-2 algorithms. the processor is based on transport triggered architecture and several optimizations have been used to improve the energy-efficiency. the processor has been synthesized on a 130nm standard cell technology and analysis show that a programmable solution can possess energy-efficiency comparable to a fixed-function ASIC.
In the sequential model of programming, instructions in a program are executed sequentially. Existing, programming languages are mainly designed for the sequential model. As the programming paradigm shifts from the se...
详细信息
ISBN:
(纸本)9783642030949
In the sequential model of programming, instructions in a program are executed sequentially. Existing, programming languages are mainly designed for the sequential model. As the programming paradigm shifts from the sequential to distributed computing, existing sequential programming languages have their limitations. Nevertheless, the sequential languages are the languages which most of programmers are most familiar with. One of the motivations of this research is to implement a framework to support the implementations of distributed applications using Sequential programming languages Such as C/C++, COBOL, and Java. In this paper, we present an implementation of a framework for open distributed programming. Allowing programmers to write distributed programs in their favorite sequential programming languages makes the programming paradigm very unique to the existing programming paradigms.
this paper studies the loosely integration of application accelerators consisting of an array of tightly-coupled lightweight reconfigurable processors into a system-on-a-chip. In order to explore a multitude of design...
详细信息
ISBN:
(纸本)9781424449231
this paper studies the loosely integration of application accelerators consisting of an array of tightly-coupled lightweight reconfigurable processors into a system-on-a-chip. In order to explore a multitude of design variations a C++ simulation model of the accelerator has been integrated with a system-on-a-chip environment consisting of a general purpose processor, a DMA controller, an interrupt controller and a memory module. Dependent on the applications, different kinds of I/O buffers are designed around the processor array and the effects of the buffer size on the overall execution time are evaluated. the evaluations are based on new mathematical estimation models derived from the system and application constraints. the estimations are validated with experimental results with an error less than 1%. Exploring several designs points that using our architecture along with suitable buffer sizes, can improve the system execution time, one to two magnitudes for the selected algorithms.
Generally, Hardware/Software (HW/SW) partitioning can be approximately resolved through some kinds of optimal algorithms. Based oil both characteristics of HW/SW partitioning and Particle Swarm Optimization (PSO) algo...
详细信息
ISBN:
(纸本)9783642030949
Generally, Hardware/Software (HW/SW) partitioning can be approximately resolved through some kinds of optimal algorithms. Based oil both characteristics of HW/SW partitioning and Particle Swarm Optimization (PSO) algorithm, a novel parallel FlW/SW partitioning method is proposed in this paper. A model of parallel HW/SW partitioning on the basis of PSO algorithm is established after analyzing the particularity of HW/SW partitioning. A hybrid strategy of PSO and Tabu Search (TS) is proposed in this paper, which uses the intrinsic parallelism of PSO and the memory function of TS to speed tip and improve the performance of PSO. To settle the problem of premature convergence, the reproduction and crossover operation of genetic algorithm (GA) is also introduced into procedure of PSO. Experimental results indicate that the parallel PSO algorithm can efficiently reduce the running time even for large task graphs.
the proliferation of RDF data on the web has increased the need for systems that can query these data while scaling withtheir growing size and number. We present an application of parallel hash-joins for basic graph ...
详细信息
the proliferation of RDF data on the web has increased the need for systems that can query these data while scaling withtheir growing size and number. We present an application of parallel hash-joins for basic graph pattern matching over large amounts of RDF designed for shared nothing architectures including high-performance clusters and the Blue Gene/L. Our approach does not require any pre-processing of the RDF data or costly index building. Rather, we rely on a cluster's high bandwidth and fast memory to load and query data in parallel and in near-real time. We present an initial evaluation of our algorithm showing competitive results on clusters of up to 1,024 processors.
parallel sorting algorithms in hypercubes have been studied extensively. One of the practical parallel sorting algorithms is Bitonic Sort, which is implemented in O(n(2)) time for sorting N = 2(n) numbers in an n-cube...
详细信息
ISBN:
(纸本)9783642030949
parallel sorting algorithms in hypercubes have been studied extensively. One of the practical parallel sorting algorithms is Bitonic Sort, which is implemented in O(n(2)) time for sorting N = 2(n) numbers in an n-cube. A versatile family of interconnection networks alternative to hypercube, called metacube, was proposed for building extremely large scale multiprocessor systems with a small number of links per node. A metacube MC(k, m) connects 2(2km+k) nodes with only k + m links per node. In this paper, we present an efficient sorting algorithm on metacube multiprocessors. the proposed sorting algorithm is based on the Batcher's bitonic sorting algorithm. In order to perform the parallel sorting efficiently in metacube, we give a new presentation of the metacube such that the communications required by the algorithm can be done efficiently with gather and scatter operations. the parallel bitonic sort algorithm implemented in metacubes withthe new presentation runs in O(2m(k) + k)(2) computation steps and O(2(m)(k)(2k + 1) + k)(2) communication steps.
the current Internet architecture nicely structures functionality into layers of protocols. While this reduces complexity, many tweaks have emerged because of the architecture's limited flexibility. Cross Layer Fu...
详细信息
ISBN:
(纸本)9781424434343
the current Internet architecture nicely structures functionality into layers of protocols. While this reduces complexity, many tweaks have emerged because of the architecture's limited flexibility. Cross Layer Functionality corrodes the layer boundaries, intermediate layers had to be introduced for protocols like MPLS and IPsec, and middleboxes - like in case of NAT - further complicate the interaction of protocols. To overcome these problems, many publications have proposed modular solutions or protocol composition, allowing software engineering ideas to improve protocol design. Other publications state that instead of choosing a single common network architecture for the Future Internet, it might be advantageous to run multiple different architectures in parallel. We combine both approaches and make it possible to rapidly create and run different network architectures in parallel. While this allows for simplified Future Internet development, it requires the network architecture to be dynamically chosen. this paper not only presents a node architecture enabling the parallel operation of different network architectures but also introduces algorithms for their selection at runtime.
the multicore revolution is underway. Classical algorithms must be revisited in order to take the hierarchical memory layout into account. In this paper, we aim at minimizing the number of cache misses paid during the...
详细信息
Service Oriented Architecture(SOA) is a new form of distributed software architecture. SOA promotes loose coupling, services distribution, dynamicity and agility. Services involved in an SOA are remote and autonomous ...
详细信息
ISBN:
(纸本)9780769535449
Service Oriented Architecture(SOA) is a new form of distributed software architecture. SOA promotes loose coupling, services distribution, dynamicity and agility. Services involved in an SOA are remote and autonomous services, the SOA designer can not control them and unpredictable behaviour can occur this makes the SOA different from other architectures for its special architecture elements and its dynamic and evolving structure. How to model this specific architecture and support service-oriented development is an important research field in service-oriented software engineering community this paper proposed a graph transformation based approach to model SOA and its evolution at runtime. Graph grammar is used to represent the architectural style, type and structural constraints are introduced to improve the robustness and adaptability when reconfiguring the architectures at runtime.
In this paper, a new parallel Montgomery binary exponentiation algorithm was proposed. this algorithm is based on the Montgomery modular reduction technique, binary method, common-multiplicand-multiplication (CMM) alg...
详细信息
ISBN:
(纸本)9783642030949
In this paper, a new parallel Montgomery binary exponentiation algorithm was proposed. this algorithm is based on the Montgomery modular reduction technique, binary method, common-multiplicand-multiplication (CMM) algorithm, and the canonical-signed-digit recoding (CSD) technique. By using the CMM algorithm of computing the common part from two modular multiplications, the same common part in two modular multiplications can be computed once rather twice, we can thus improve the efficiency of the binary exponentiation algorithm by decreasing the number of modular multiplications. Furthermore, by using the proposed parallel CMM-CSD Montgomery binary exponentiation algorithm, the total number of single-precision multiplications can be reduced by about 66.7% and 30% as compared withthe original Montgomery algorithm and the Ha-Moon's improved Montgomery algorithm, respectively.
暂无评论