This paper deals with distributed CA-CFAR detection in presence of Gaussian and non-Gaussian clutter. In Gaussian environment, we propose to apply a wavelet transform based on soft-thresholding in multisensor CA-CFAR ...
详细信息
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significan...
详细信息
ISBN:
(纸本)9781665494663
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significantly for big-data analytics by bringing compute to data, thereby eliminating costly data transfer while offering better privacy. In this work, we introduce Solana, the first-ever high-capacity(12-TB) CSD in El.S form factor, and present an actual prototype for evaluation. To demonstrate the benefits of in-storage processing on CSD, we deploy several natural language processing (NLP) applications on datacenter-grade storage servers comprised of clusters of the Solana. Experimental results show up to 3.1x speedup in processing while reducing the energy consumption and data transfer by 67% and 68%, respectively, compared to regular enterprise SSDs.
Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query proces...
详细信息
ISBN:
(纸本)9781538638354
Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 1 billion time series in less than 2 hours, while the state of the art centralized algorithms need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.
Remotely sensed hyperspectral imaging is a technique that generates hundreds of spectral bands at different wavelength channels for the same area on the surface of the Earth. Computationally effective processing of th...
详细信息
ISBN:
(纸本)9781424456499
Remotely sensed hyperspectral imaging is a technique that generates hundreds of spectral bands at different wavelength channels for the same area on the surface of the Earth. Computationally effective processing of these image cubes can be greatly beneficial in many application domains, including environmental modeling, risk/hazard prevention and response, or defense/security. With the aim of providing an overview of recent developments and new trends in the design of parallel and distributed systems for hyperspectral image analysis, this paper discusses and inter-compares four different strategies for efficiently implementing a standard hyperspectral image processing chain: 1) commodity Beowulf-type clusters, 2) heterogeneous networks of workstations, 3) field programmable gate arrays (FPGAs), and 4) graphics processing units (GPUs). Combined, these parts deliver a snapshot of the state-of-the-art in those areas, and a thoughtful perspective on the potential and emerging challenges of adapting high performance computing systems to remote sensing problems.
Multiple sequence alignment is a fundamental and very computationally intensive task in molecular biology. MUSCLE, a new algorithm for creating multiple alignments of protein sequences, achieves a highest rank in accu...
详细信息
Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FS...
详细信息
ISBN:
(纸本)9781538612149
Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more efficiently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes difficult. applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The parallel FSM scheme further shows equivalent or better image quality than the serial implementation in two image processingapplications Edge Detection and Frame Difference.
Volume rendering by ray casting is a computationally expensive problem. For interactive volume visualization, rendering has to be done in real time (30 frames/sec). Since the typical 3-D dataset size is at least 1283,...
详细信息
Volume rendering by ray casting is a computationally expensive problem. For interactive volume visualization, rendering has to be done in real time (30 frames/sec). Since the typical 3-D dataset size is at least 1283, the use of parallelprocessing is imperative. In this paper, we present an O(log n) EREW algorithm for volume rendering using O(n3) processors which can be optimized to O(log3n) time using O(n3/log3n) processors. We have implemented our algorithm on MasPar MP1200. The implementation results show that a frame from 1233 data size is generated in about 3 seconds using 4096 processors.
In a proxy re-encryption (PRE) scheme, a semitrusted proxy can convert a ciphertext under Alice's public key into another ciphertext that Bob can decrypt without accessing the underlying plaintext. This property a...
详细信息
ISBN:
(纸本)9781509041527
In a proxy re-encryption (PRE) scheme, a semitrusted proxy can convert a ciphertext under Alice's public key into another ciphertext that Bob can decrypt without accessing the underlying plaintext. This property adds flexibility in various applications, such as cloud data sharing. In this paper, we study CCA-secure, single-hop unidirectional PRE schemes without pairings. We gain high efficiency and public verifiability which enables anyone to publicly verify the validity of the original ciphertexts and re-encrypted ciphertexts. With public verifiability, we can offload the integrity check of the wellformedness of ciphertexts from power-restrained clients to any honest-but-curious untrusted public cloud for improved efficiency.
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of meshes, but had n...
详细信息
ISBN:
(纸本)0769521320
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of meshes, but had not previously been captured in benchmarks. The new suite, named NPB (NAS parallel Benchmarks) Multi-Zone, is extended from the NPB suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the hybrid parallel codes.
In this paper we propose a price-based user-optimal job allocation scheme for grid systems whose nodes are connected by a communication network. The job allocation problem is formulated as a noncooperative game among ...
详细信息
暂无评论