the widespread application of direct-sequence spread-spectrum code division multiple access (DS/SS-CDMA) to wireless communication systems asks for el er faster and more reliable real-time signal processing operations...
详细信息
the widespread application of direct-sequence spread-spectrum code division multiple access (DS/SS-CDMA) to wireless communication systems asks for el er faster and more reliable real-time signal processing operations to be performed by highly integrated and low-power consumption digital receivers. One of the most critical signal processing tasks to be performed by the DS/SS-CDMA receiver is signal presence detection and code epoch estimation. this paper deals withthe design and realization of an application-specific integrated circuit (ASIC) for fast signal recognition and code acquisition (SR/CA) in packet DS/SS-CDMA receivers operating in a satellite or terrestrial radio network, In particular, we show how a parallel acquisition circuit can be effectively implemented on a single-chip with a 1.0-mu m CMOS technology according to the specifications of the ARCANET Ku-band CDMA VSAT satellite network sponsored by the European Space Agency (ESA). It is shown that the ASIC performance closely follows analytical predictions.
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. this situation is common in many important applications. Unfortunately, kn...
详细信息
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. this situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for efficient run-time parallelization in distributed shared-memory (DSM) multiprocessors. the idea is to execute the code in parallel speculatively and use extensions to the cache coherence protocol hardware to detect any dependence violations. As soon as a dependence is detected, execution stops, the state is restored, and the code is re-executed serially. this scheme, which we apply to loops, allows iterations to execute and complete in potentially any order. this scheme requires hardware extensions to the cache coherence protocol and memory hierarchy of a DSM. It has low overhead. In this paper, we present the algorithms and a hardware design of the scheme. Overall, the scheme delivers average loop speedups of 7.3 for 16 processors and is 50% faster than a related software-only method.
Irregular applications based on sparse matrices are at the core of many important scientific computations. Since the importance of such applications is likely to increase in the future, high-performance parallel and d...
详细信息
Irregular applications based on sparse matrices are at the core of many important scientific computations. Since the importance of such applications is likely to increase in the future, high-performance parallel and distributed systems must provide adequate support for such applications. We characterize a family of irregular scientific applications and derive the demands they will place on the communication systems of future parallel systems. Running time of these applications is dominated by repeated sparse matrix vector product (SMVP) operations. Using simple performance models of the SMVP, we investigate requirements for bisection bandwidth, sustained bandwidth on each processing element (PE), burst bandwidth during block transfers, and block latencies for PEs under different assumptions about sustained computational throughput. Our model indicates that block latencies are likely to be the most problematic engineering challenge for future communication networks.
the proceedings contain 118 papers. the special focus in this conference is on parallel and distributedprocessing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel al...
ISBN:
(纸本)3540643591
the proceedings contain 118 papers. the special focus in this conference is on parallel and distributedprocessing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel algorithm for minimum cost path computation on polymorphic processor array;a performance modeling and analysis environment for reconfigurable computers;an integrated partitioning and synthesis system for dynamically reconfigurabte multi-FPGA architectures;temporal partioning for partially-reconfigurable-field-programmable gate;a java development and runtime environment for reconfigurable computing;synthesizing reconfigurable sequential machines using tabular models;evaluation of a low-power reconfigurable DSP architecture;a reconfigurable hardware-monitor for communication analysis in distributed real-time systems;a mathematical benefit analysis of context switching reconfigurable computing;a configurable computing approach towards real-time target tracking;hardware reconfigurable neural networks;a simulator for the reconfigurable mesh architecture;processor architectures for circuit emulation;an empirical comparison of runtime systems for conservative parallel simulation;synchronizing operations on multiple objects;migration and rollback transparency for arbitrary distributedapplications in workstation clusters;a topology based approach to coordinated multicast operations;a parallel evolutionary algorithm for the vehicle routing problem with heterogeneous fleet;artificial neural networks on reconfigurable meshes;a molecular quasi-random model of computations applied to evaluate collective intelligence;replicated shared object model for edge detection with spiral architecture and scheduling tasks of a parallel program in two-processor systems with use of cellular automata.
High throughput and dynamic reconfigurability are required in many tasks, especially real-time applications. A logical structure of parallel computing system for such applications is a pipeline of multiprocessor modul...
详细信息
Recently, PC clusters have come to be studied intensively, for a large scale parallel computer in the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. ...
详细信息
ISBN:
(纸本)0818685794
Recently, PC clusters have come to be studied intensively, for a large scale parallel computer in the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. therefore an ATM connected PC cluster is very promising platform from the cost/performance point of view, as a future high performance computing environment. In this paper, an ATM connected PC cluster consists of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed when a TCP window size parameter is changed. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of retransmission mechanism suitable for parallelprocessing on the large scale PC cluster are clarified In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for massively parallel processors, in addition to the conventional scientific calculation. thus investigating the feasibility of such applications on an ATM connected PC cluster is quite meaningful. parallel data mining is implemented and evaluated on the cluster. Default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters according to the proposed optimization, sufficient performance improvement is achieved for parallel data mining on 100 PCs.
the paper introduces a mechanism to implement distributed scheduling for CAN-bus resource in order to meet the requirements of a dynamic distributed real-time system. the key issues considered here, are multicasting, ...
详细信息
A set of synchronization relations between distributed nonatomic events was recently proposed to provide real-time applications with a fine level of discrimination in the specification of causality relations and synch...
详细信息
Embedded high perfonnance computing is being called upon to provide critical computing resources with increasing frequency. the ability to tolerate faults during operation, both maintaining operational capability and ...
详细信息
We review the growing power and capability of commodity computing and communication technologies largely driven by commercial distributed information systems. these systems are built from CORBA, Microsoft's COM, J...
详细信息
ISBN:
(纸本)3540649522
We review the growing power and capability of commodity computing and communication technologies largely driven by commercial distributed information systems. these systems are built from CORBA, Microsoft's COM, JavaBeans, and rapidly advancing Web approaches. One can abstract these to a three-tier model with largely independent clients connected to a distributed network of servers. the latter host various services including object and relational databases and of course parallel and sequential computing. High performance can be obtained by combining concurrency at the middle server tier with optimized parallel back end services. the resultant system combines the needed performance for large-scale HPCC applications withthe rich functionality of commodity systems. Further the architecture with distinct interface, server and specialized service implementation layers, naturally allows advances in each area to be easily incorporated. We illustrate how performance can be obtained within a commodity architecture and we propose a middleware integration approach based on JWORB (Java Web Object Broker) multi-protocol server technology. Examples are given from collaborative systems, support of multidisciplinary interactions, proposed visual HPCC ComponentWare, quantum Monte Carlo and distributed interactive simulations.
暂无评论