Remote Direct Memory Access (RDMA) and point-to-point network fabrics both have their own advantages. MPI middleware implementations typically use one or the other, however, the appearance of the Internet Wide Area RD...
详细信息
ISBN:
(纸本)9781424416936
Remote Direct Memory Access (RDMA) and point-to-point network fabrics both have their own advantages. MPI middleware implementations typically use one or the other, however, the appearance of the Internet Wide Area RDMA Protocol (iWARP), RDMA over IP, and protocol off-load devices introduces the opportunity to use a hybrid design for MPI middleware that uses both iWARP and a transport protocol directly. We explore the design of a new MPICH2 channel device based on iWARP and the Stream Control Transmission Protocol (SCTP) that uses SCTP for all point-to-point MPI routines and iWARP for all remote memory access routines (i.e., one-sided communication). the design extends the Ohio Supercomputer Center software-based iWARP stack and our MPICH2 SCTP-based channel device. the hybrid channel device aligns the semantics of the MPI routine withthe underlying protocol that best supports the routine and also allows the MPI API to exploit the potential performance benefits of the underlying hardware more directly. We describe the design and issues related to the progress engine design and connection setup. We demonstrate how to implement iWARP over SCTP rather than TCP and discuss its advantages and disadvantages. We are not aware of any other software implementations of iWARP over SCTP, nor MPI middleware that uses both iWARP verbs and the SCTP API.
Wireless sensor networks (WSNs) are widely used for various monitoring applications. Users issue queries to sensors and collect sensing data Due to the low quality sensing devices or random link failures, sensor data ...
详细信息
ISBN:
(纸本)9781424416936
Wireless sensor networks (WSNs) are widely used for various monitoring applications. Users issue queries to sensors and collect sensing data Due to the low quality sensing devices or random link failures, sensor data are often noisy. In order to increase the reliability of the query results, continuous queries are often employed. In this work we focus on continuous holistic queries like Median. Existing approaches are mainly designed for non-holistic queries like Average. However, it is not trivial to answer holistic ones due to their non-decomposable property. We propose two schemes for answering queries under different data changing conditions. While sensor data changes slowly, based on the data correlation between different rounds, we propose one algorithm for getting the exact answers. When the data changing speed is high, we propose another approach to derive the approximate results. We evaluate both designs through extensive simulations. the results demonstrate that our approach significantly reduces the traffic cost compared with previous works while maintaining the same accuracy.
this paper presents a distributed file-system for the present day medium-sized network. Existing servers and workstations pool their unused storage resources to form a communal share. Erasure codes provide fault toler...
详细信息
Withthe rapid progress of high-performance cluster applications, data transfer between clusters in distant locations becomes more important. But, it is difficult to transfer data using parallel TCP streams on long di...
详细信息
We present an efficient algorithm for nonlocal image filtering with applications in electron cryomicroscopy. Our denoising algorithm is a rewriting of the recently proposed nonlocal mean filter. It builds on the separ...
详细信息
ISBN:
(纸本)9781424420025
We present an efficient algorithm for nonlocal image filtering with applications in electron cryomicroscopy. Our denoising algorithm is a rewriting of the recently proposed nonlocal mean filter. It builds on the separable property of neighborhood filtering to offer a fast parallel and vectorized implementation in contemporary shared memory computer architectures while reducing the theoretical computational complexity of the original filter. In practice, our approach is much faster than a serial, non-vectorized implementation and it scales linearly with image size. We demonstrate its efficiency in data sets from Caulobacter crescentus tomograms and a cryoimage containing viruses and provide visual evidences attesting the remarkable quality of the nonlocal means scheme in the context of cryoimaging. With such development we provide biologists with an attractive filtering tool to facilitate their scientific discoveries.
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, scheduling resources, and debugging applicat...
详细信息
ISBN:
(纸本)9781424442379
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, scheduling resources, and debugging applications. Dedicated networks on inter-site communications are rarely available for the monitoring purposes. therefore, for real-time monitoring systems, reducing communication cost is important to handle a large number of nodes with limited network resources. We implemented a real-time Grid monitoring system called VGXP with techniques for low cost data gathering. It tries to send only diffs to recent data, and adapts to the requested data freshness and tolerable errors to minimize required communication. We evaluate monitoring overheads of the proposed method on a distributed environment consisting of 8-sites with500 nodes. In a realistic setting where the sampling interval is set to 0.5 seconds and the tolerable error to 2%, the CPU usage of the server to gather data from all nodes was 0.2% and the transfer rate was less than 5kbps. the transfer rate did not exceed 50kbps even if we gather a detailed per-process statistics.
parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of pu...
详细信息
parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. this architecture is well-suited for different point addition and doubling algorithms over GF(p) to be implemented on FPGAs. It allows the execution time to scale withthe number of modular multipliers and exhibits nearly no overhead compared to the mere runtime of the multipliers. the advantages of this distributed memory architecture are demonstrated by means of two different point addition and doubling algorithms.
this paper presents a scalable control system for a unified micro-cellular network named MM-MAN (Mobile Multimedia Metropolitan Area Network) in which fast terminals are provided high-bit rate IP packet transfer. In o...
详细信息
ISBN:
(纸本)9781424418756
this paper presents a scalable control system for a unified micro-cellular network named MM-MAN (Mobile Multimedia Metropolitan Area Network) in which fast terminals are provided high-bit rate IP packet transfer. In our previous papers, proposed schemes to guarantee smooth connections to fast movers in spite of frequent movement are LMC (Logical Macro Cell) and parallel polling. LMC-a multicast group of adjacent micro-cells and pollings are emitted from all BSs of the same LMC create a symmetric environment as a virtual single cell so the cell-to-cell movement of a mobile terminal within an LMC can be passed over. Detail of the distributed control for mobility management is described in this article. An extended LMC is introduced to conduct pre-downloading of packets and to allow distributedprocessing for the LMC switchover. However, the radio active channel is manipulated only at BSs in the LMC range but not in the extended LMC to save radio resources due to the overhead of parallel polling. If the polling response comes to the BS which differs from the central cell's BS of the LMC, this BS will be placed to become the central cell of the new LMC, and polling acknowledgement is multicast to the new extended LMC. the neighboring BS on the movement direction of the target mobile terminal (MT) can realize movement of the MT, and starts actions to join the new LMC by itself without help from the centralized control. these procedures can hide the delay in the cell-to-cell movement of the terminal even when it goes out from the LMC, and guarantee scalable and high-performance control over micro-cellular network. the simulation results tells the handover latency is less than 5ms, and the throughput for MT in case of the continuous multimedia like moving picture is 2Mbps over the 54Mbps wireless interface.
this paper presents several parallel FFT algorithms with different degree of communication overhead for multiprocessors in Network-on-Chip(NoC) environment. three different methods of parallel FFT are presented. One i...
详细信息
ISBN:
(纸本)9780769530994
this paper presents several parallel FFT algorithms with different degree of communication overhead for multiprocessors in Network-on-Chip(NoC) environment. three different methods of parallel FFT are presented. One is the reference parallel FFT for comparison, and the other two with well-distributed computation as well as reduced communication overhead. By evenly distributing parallel computation tasks which uses data locality, the execution time for completing each stage of FFT can be reduced. Moreover, by optimizing data exchanges we minimize the communication overhead. Depending on the communication regularity, one can select appropriate parallel FFT algorithm. By using the simulation results of our cycle-accurate SystemC NoC model with a parameterizable 2-D mesh architecture, and the performance analysis in time as well as complexity, our proposed algorithms are shown to outperform other parallel FFT algorithm or high-speed DSP implementations.
Network Intrusion Detection System (NIDS) demands have been steadily increasing over the past few years. Current solutions using software become inefficient running on high speed high volume networks and will end up d...
详细信息
ISBN:
(纸本)9780769531809
Network Intrusion Detection System (NIDS) demands have been steadily increasing over the past few years. Current solutions using software become inefficient running on high speed high volume networks and will end up dropping packets. Hardware solutions are available and result in much higher efficiency but present problems such as flexibility and cost. Our proposed system uses a modified version of Snort, a robust widely deployed open-sourced NIDS. Snort spends a significant fraction of its processing time doing pattern matching. Our proposed system runs Snort in software until it gets to the pattern matching function and then offloads that processing to the Field Programmable Gate Array (FPGA). the hardware is able to process data at up to 1.7GB/s on one Xilinx XC2VP100 FPGA. Our system is more flexible than other FPGA string matching designs in that the rules are not hard-coded. the design is scalable and allows FPGAs to be used in parallel to increase the processing speed even further.
暂无评论