Sorting is one of the classic problems of data processing and many practical applications require implementation of parallel sorting algorithms. Only a few algorithms have been implemented using MPI, in this paper a f...
详细信息
ISBN:
(纸本)9781479975051
Sorting is one of the classic problems of data processing and many practical applications require implementation of parallel sorting algorithms. Only a few algorithms have been implemented using MPI, in this paper a few additional parallel sorting algorithms have been implemented using MPI. A unified performance analysis of all these algorithms has been presented using two different architectures. On basis of experimental results obtained some guidelines has been suggested for the selection of proper algorithms.
this work proposes a novel technique for accelerating sparse recovery algorithms on multi-core shared memory architectures. All prior works attempt to speed-up algorithms by leveraging the speed-ups in matrix-vector p...
详细信息
ISBN:
(纸本)9781479946129
this work proposes a novel technique for accelerating sparse recovery algorithms on multi-core shared memory architectures. All prior works attempt to speed-up algorithms by leveraging the speed-ups in matrix-vector products offered by the GPU. A major limitation of these studies is that in most signal processing applications, the operators are not available as explicit matrices but as implicit fast operators. In such a practical scenario, the prior techniques fail to speed up the sparse recovery algorithms. Our work is based on the principles of stochastic gradient descent. the main sequential bottleneck of sparse recovery methods is a gradient descent step. Instead of computing the full gradient, we compute multiple stochastic gradients in parallel cores;the full gradient is estimated by averaging these stochastic gradients. the other step of sparse recovery algorithms is a shrinkage operation which is inherently parallel. Our proposed method has been compared with existing sequential algorithms. We find that our method is as accurate as the sequential version but is significantly faster - the larger the size of the problem, the faster is our method.
We discuss a neuroscience-inspired dynamic architecture (NIDA) and associated design method based on evolutionary optimization. NIDA networks designed to perform anomaly detection tasks and control tasks have been sho...
详细信息
We discuss a neuroscience-inspired dynamic architecture (NIDA) and associated design method based on evolutionary optimization. NIDA networks designed to perform anomaly detection tasks and control tasks have been shown to be successful in previous work. In particular, NIDA networks perform well on tasks that have a temporal component. We present methods for using NIDA networks on classification tasks in which there is no temporal component, in particular, the handwritten digit classification task. the approach we use for both methods produces useful subnetworks that can be combined to produce a final network or combined to produce results using an ensemble method. We discuss how a similar approach can be applied to other problem types.
parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-inte...
详细信息
parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-intensive workloads due to (1) reliability and instant processing of data requests under dynamic workload conditions, and (2) the optimum tradeoff between system scalability and data reliability in data-intensive systems. To increase computing performance and reduce power consumption, Graphics processing Units (GPUs) will be used. As the architectures and data processingalgorithms for GPU-based parallel disk systems are still in their infancy, this research will develop novel hardware and software architecturesthat include parallel GPU, flash disks, and disk arrays for data-intensive applications. (c) 2014 Published by Elsevier B.V.
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-...
ISBN:
(纸本)9783319038889
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-based scheduling of parallel tasks for heterogeneous cluster platforms;optimistic concurrency control for energy efficiency in the wireless environment;synchronization-reducing variants of the biconjugate gradient and the quasi-minimal residual methods;exploring irregular reduction support in transactional memory;coordinate task and memory management for improving power efficiency;hardware-assisted intrusion detection by preserving reference information integrity;towards automatic generation of hardware classifiers;a practical approach for finding small independent, distance dominating sets in large-scale graphs;and heterogeneous computing vs. big data: the case of cryptanalytical applications.
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-...
ISBN:
(纸本)9783319038582
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-based scheduling of parallel tasks for heterogeneous cluster platforms;optimistic concurrency control for energy efficiency in the wireless environment;synchronization-reducing variants of the biconjugate gradient and the quasi-minimal residual methods;exploring irregular reduction support in transactional memory;coordinate task and memory management for improving power efficiency;hardware-assisted intrusion detection by preserving reference information integrity;towards automatic generation of hardware classifiers;a practical approach for finding small independent, distance dominating sets in large-scale graphs;and heterogeneous computing vs. big data: the case of cryptanalytical applications.
Despite the fact that physics-based sound synthesis is becoming more and more efficient in rich and natural high-quality sound synthesis, its high computational complexity limits its use in portable devices. this cons...
详细信息
ISBN:
(纸本)9781479944415
Despite the fact that physics-based sound synthesis is becoming more and more efficient in rich and natural high-quality sound synthesis, its high computational complexity limits its use in portable devices. this constraint motivated research on parallelprocessingarchitecturesthat support the physics-based sound synthesis of musical instruments. Since no general consensus has been reached which grain sizes of many-core processors and memories provide the most efficient operation for sound synthesis, this paper explores a many-core processor for varying its PE configurations. To find the optimal PE configuration, each PE configuration is evaluated in terms of execution time, system power, and area. Experimental results indicate that the most efficient operation in order to synthesize 44,100 six-note polyphonic acoustic guitar sound sampled at 44.1 kHz is achieved as the number of PEs equals to 192. Likewise, all PE configurations used in this study are satisfied withthe system requirements to implement sound synthesis on a portable device.
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-tim...
详细信息
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-time transmission system using standard 10G Ethernet interfaces. the system comprises two parallel complex-valued data channels in each direction. Standard FPGAs and low-cost multi-channel analogue-to-digital converters (ADCs) and digital-to-analogue converters (DACs) have been used. For enhanced robustness and optimal usage of the power amplifier, π/4-shift differential quaternary phase-shift keying (DQPSK) modulation is used. All digital signal processing routines for synchronization, equalization, forward error correction etc. have been fully implemented and tested. Using a protocol analyzer, error-free bidirectional transmission of Ethernet frames at 5 Gbit/s is verified. Error-vector magnitude (EVM) values below -30 dB indicate that even higher speeds could be realized.
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time due to its high computational cost;hence the working efficiency of the oil industry is affected. the general purpose Graphic processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) developed by NVIDIA have provided a new solution to this problem. In this study, we have proposed a parallel algorithm of the Kirchhoff pre-stack depth migration and an optimization strategy based on the CUDA technology. Our experiments indicate that for large data computations, the accelerated algorithm achieves a speedup of 8 similar to 15 times compared with NVIDIA GPU.
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential...
详细信息
ISBN:
(纸本)9789897580277
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential analysis, so the need for parallelprocessing. Over the last few years, the widespread deployment of multicore architectures, accelerators, grids, clusters, and other powerful architectures such as FPGAs and ASICs has encouraged researchers to write parallelalgorithms using available parallel computing paradigms to solve such problems. the major challenge now is to take advantage of these architectures irrespective of their heterogeneity. this is due to the fact that designing an execution model that can unify all computing resources is still very difficult. Moreover, scheduling tasks to run efficiently on heterogeneous architectures still needs a lot of research. Existing solutions tend to focus on individual architectures or deal with heterogeneity among CPUs and GPUs only, but in reality, often, heterogeneous systems exist. Up to now very cumbersome, manual adaption is required to take advantage of these heterogeneous architectures. the aim of this paper is to provide a proposal for a functional-level design of a multiagent-based framework to deal withthe heterogeneity of hardware architectures and parallel computing paradigms deployed to solve those problems. Bioinformatics will be selected as a case study.
暂无评论