Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics processing Units (GPGPU) and recent many-core CPUs h...
详细信息
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying f...
详细信息
Phasor Measurement Units(PMUs)are being rapidly deployed in power grids due to their high sampling *** offer a more current and accurate visibility of the power grids than traditional SCADA ***,the high sampling rates...
详细信息
ISBN:
(纸本)9781479951499
Phasor Measurement Units(PMUs)are being rapidly deployed in power grids due to their high sampling *** offer a more current and accurate visibility of the power grids than traditional SCADA ***,the high sampling rates of PMUs bring in two major challenges that need to be addressed to fully benefit from these PMU *** one hand,any transient events captured in the PMU measurements can negatively impact the performance of steady state *** the other hand,processingthe high volumes of PMU data in a timely manner poses another challenge in *** paper presents PDFA,a parallel detrended fluctuation analysis approach for fast detection of transient events on massive PMU measurements utilizing a computer *** performance of PDFA is evaluated from the aspects of speedup,scalability and accuracy in comparison withthe standalone DFA approach.
the biomedical imagery, the numeric communications, the acoustic signal processing and many others gls[dsp] applications are present more and more in the numeric world. they process growing data volume which is repres...
详细信息
ISBN:
(纸本)9781479961245
the biomedical imagery, the numeric communications, the acoustic signal processing and many others gls[dsp] applications are present more and more in the numeric world. they process growing data volume which is represented with more and more accuracy, and use complex algorithms with time constraints to satisfying. Consequently, a high requirement of computing power characterize them. To satisfy this need, it's inevitable today to use parallel and heterogeneous architectures in order to speedup the processing, where the best examples are today's supercomputers like "Tianhe-2" and "Titan" of Top500 ranking. these architectures withtheir multi-core nodes supported by many-core accelerators offer a good response to this problem. However, they are still hard to program to make performance because of many reasons: parallelism expression, task synchronization, memory management, hardware specifications handling, load balancing. In the present work, we are characterizing DSP applications and propose a programming model based on their distinctiveness in order to implement them easily and efficiently on heterogeneous clusters.
this paper describes a high-speed software implementation of Elliptic Curve Cryptography (ECC) for GeForce GTX graphics cards equipped with an NVIDIA GT200 Graphics processing Unit (GPU). In order to maximize throughp...
详细信息
Hitherto, the low electrochemical stability of the catalyst is one of the big issues hindering the commercial application of proton exchange membrane fuel cells (PEMFCs) . In this work, more stable support materials b...
详细信息
Hitherto, the low electrochemical stability of the catalyst is one of the big issues hindering the commercial application of proton exchange membrane fuel cells (PEMFCs) . In this work, more stable support materials based on functionalized graphene nanosheets (GNS), porous GNS, heteroatom doped GNS, and alternative GNS composites including GNS/nano-carbon (or nano-ceramics) sandwiches, nanoceramic wedged GNS, and core-shell graphene and amorphous carbon composites are prepared and applied in catalysts towards oxygen reduction reaction (ORR). based on the idea of bifunction of GNS to Pt catalysts, highly active and stable Pt/reduced graphene oxide (RGO) catalysts are developed by tuning the O/C atom ratio of RGO supports where the optimized O/C atom ratio of 0.14 is determined. Meantime, both perfluorosulfonic acid (PFSA) functionalized GNS and sulfonic acid group-grafted RGO supported Pt catalysts show a higher catalytic activity and a lower loss rate of electrochemical active area (ECA) in comparison withthat of the plain Pt/GNS and conventional Pt/C catalysts. In addition, the N-doped RGO supported Pt catalyst (Pt/NRGO) is synthesized using a lyophilisation-assisted N-doping method, revealing a higher catalytic activity and a lower ECA loss of the Pt/NRGO catalyst to compare withthat of the Pt/GO and Pt/C catalysts. In addition, to tackle the stacking issues of GNS which leads to the low mass transport property, the porous GNS are synthesized. Besides, we also describe a new strategy to synthesize GNS hybrids including GNS/nano-carbon (nano-creamics) sandwiches and nano-ceramic wedged GNS architectures. these unique architectures with highly dispersed Pt NPs exhibit much high catalytic activities towards ORR and an excellent electrochemical stability. At last, a new graphene @ amorphous carbon core-shell material also shows an excellent electrochemical property. this work was supported financially by the National Natural Science Foundation of China (NSFC) (No. 5
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle...
详细信息
ISBN:
(纸本)9781479904945;9781479904938
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle withthe efficient processing of such networks. Employing multi-core graphic processing units (GPUs) could provide sufficient processing power for the analysis of such networks. However, commonly designed algorithms cannot exploit these massively parallelprocessing power for the analysis of such networks. In this paper, we present the Multi Layer Network Decomposition (MLND) approach which provides a general approach for parallel network analysis using multi-core processors via efficient partitioning and mapping of networks onto GPU architectures. Evaluation using a 336 core GPU graphic card demonstrated a 16x speed-up in complex network analysis relative to a CPU based approach.
this paper presents Fast Fourier Transform (FFT) benchmark results to measure and compare the performance of various DSP and Intel processors for underwater signal processing applications. this paper aims to show perf...
详细信息
ISBN:
(纸本)9781467344265;9781467344258
this paper presents Fast Fourier Transform (FFT) benchmark results to measure and compare the performance of various DSP and Intel processors for underwater signal processing applications. this paper aims to show performance enhancement in Intel processors as compared to DSP processors by using parallel programming for implementing signal processing functions in real time. this paper provides results that show a significant decrease in FFT execution time on an Intel based Multicore processor using parallel programming. therefore comparative analysis among different processor architectures presented in this paper will help the system designers in selecting an optimal processor for underwater signal processing applications.
As CPU technology trend is strongly moving towards multi-core architectures, HEVC tried to embrace the parallelprocessing trend to possible extent. Hence, HEVC exploits some of the parallelprocessing capabilities li...
详细信息
ISBN:
(纸本)9789897581298
As CPU technology trend is strongly moving towards multi-core architectures, HEVC tried to embrace the parallelprocessing trend to possible extent. Hence, HEVC exploits some of the parallelprocessing capabilities like tiles, slices and WPP at frame level (Sullivan et al., 2012). Although slices, tiles and WPP can be used to achieve parallelism, they might end-up degrading either visual quality or compression efficiency. To address this problem, this paper tries to summarize/exploit the possible parallelprocessing capabilities of HEVC at Coding Tree Block (CTB) level with insignificant compromise in video quality and compression.
Advanced SSDs employ a RAM-based write buffer to improve their write performance. the buffer intentionally delays write requests in order to reduce flash write traffic and reorders them to minimize the cost of garbage...
详细信息
ISBN:
(纸本)9781479961245
Advanced SSDs employ a RAM-based write buffer to improve their write performance. the buffer intentionally delays write requests in order to reduce flash write traffic and reorders them to minimize the cost of garbage collection. this work presents a novel buffer algorithm for page-mapping multichannel SSDs. We propose grouping temporally or spatially correlated buffer pages and writing these grouped buffer pages to the same flash block. this strategy dramatically increases the probability of bulk data invalidations in flash blocks. In multichannel architectures, channels are assigned to their own groups of buffer pages for writing, and so channel striping does not divide a group of correlated buffer pages into small pieces. We have conducted simulations and experiments using a SSD simulator and a real SSD platform, respectively. Our results show that our design greatly outperforms existing buffer algorithms.
暂无评论