parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-inte...
详细信息
parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-intensive workloads due to (1) reliability and instant processing of data requests under dynamic workload conditions, and (2) the optimum tradeoff between system scalability and data reliability in data-intensive systems. To increase computing performance and reduce power consumption, Graphics processing Units (GPUs) will be used. As the architectures and data processingalgorithms for GPU-based parallel disk systems are still in their infancy, this research will develop novel hardware and software architecturesthat include parallel GPU, flash disks, and disk arrays for data-intensive applications. (c) 2014 Published by Elsevier B.V.
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-...
ISBN:
(纸本)9783319038889
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-based scheduling of parallel tasks for heterogeneous cluster platforms;optimistic concurrency control for energy efficiency in the wireless environment;synchronization-reducing variants of the biconjugate gradient and the quasi-minimal residual methods;exploring irregular reduction support in transactional memory;coordinate task and memory management for improving power efficiency;hardware-assisted intrusion detection by preserving reference information integrity;towards automatic generation of hardware classifiers;a practical approach for finding small independent, distance dominating sets in large-scale graphs;and heterogeneous computing vs. big data: the case of cryptanalytical applications.
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-...
ISBN:
(纸本)9783319038582
the proceedings contain 76 papers. the topics discussed include: clustering and change detection in multiple streaming time series;lightweight identification of captured memory for software transactional memory;layer-based scheduling of parallel tasks for heterogeneous cluster platforms;optimistic concurrency control for energy efficiency in the wireless environment;synchronization-reducing variants of the biconjugate gradient and the quasi-minimal residual methods;exploring irregular reduction support in transactional memory;coordinate task and memory management for improving power efficiency;hardware-assisted intrusion detection by preserving reference information integrity;towards automatic generation of hardware classifiers;a practical approach for finding small independent, distance dominating sets in large-scale graphs;and heterogeneous computing vs. big data: the case of cryptanalytical applications.
Despite the fact that physics-based sound synthesis is becoming more and more efficient in rich and natural high-quality sound synthesis, its high computational complexity limits its use in portable devices. this cons...
详细信息
ISBN:
(纸本)9781479944415
Despite the fact that physics-based sound synthesis is becoming more and more efficient in rich and natural high-quality sound synthesis, its high computational complexity limits its use in portable devices. this constraint motivated research on parallelprocessingarchitecturesthat support the physics-based sound synthesis of musical instruments. Since no general consensus has been reached which grain sizes of many-core processors and memories provide the most efficient operation for sound synthesis, this paper explores a many-core processor for varying its PE configurations. To find the optimal PE configuration, each PE configuration is evaluated in terms of execution time, system power, and area. Experimental results indicate that the most efficient operation in order to synthesize 44,100 six-note polyphonic acoustic guitar sound sampled at 44.1 kHz is achieved as the number of PEs equals to 192. Likewise, all PE configurations used in this study are satisfied withthe system requirements to implement sound synthesis on a portable device.
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-tim...
详细信息
Future converged fixed-mobile networks need high-speed radio links in deployment scenarios where fibre is not available or too expensive. In this paper, we present a field-programmable gate array (FPGA)-based real-time transmission system using standard 10G Ethernet interfaces. the system comprises two parallel complex-valued data channels in each direction. Standard FPGAs and low-cost multi-channel analogue-to-digital converters (ADCs) and digital-to-analogue converters (DACs) have been used. For enhanced robustness and optimal usage of the power amplifier, π/4-shift differential quaternary phase-shift keying (DQPSK) modulation is used. All digital signal processing routines for synchronization, equalization, forward error correction etc. have been fully implemented and tested. Using a protocol analyzer, error-free bidirectional transmission of Ethernet frames at 5 Gbit/s is verified. Error-vector magnitude (EVM) values below -30 dB indicate that even higher speeds could be realized.
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Kirchhoff pre-stack depth migration (KPSDM) algorithm, as one of the most widely used migration algorithms, plays an important part in getting the real image of the earth. However, this program takes considerable time due to its high computational cost;hence the working efficiency of the oil industry is affected. the general purpose Graphic processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) developed by NVIDIA have provided a new solution to this problem. In this study, we have proposed a parallel algorithm of the Kirchhoff pre-stack depth migration and an optimization strategy based on the CUDA technology. Our experiments indicate that for large data computations, the accelerated algorithm achieves a speedup of 8 similar to 15 times compared with NVIDIA GPU.
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential...
详细信息
ISBN:
(纸本)9789897580277
the exponential increase of the amount of data available in several domains and the need for processing such data makes problems become computationally intensive. Consequently, it is infeasible to carry out sequential analysis, so the need for parallelprocessing. Over the last few years, the widespread deployment of multicore architectures, accelerators, grids, clusters, and other powerful architectures such as FPGAs and ASICs has encouraged researchers to write parallelalgorithms using available parallel computing paradigms to solve such problems. the major challenge now is to take advantage of these architectures irrespective of their heterogeneity. this is due to the fact that designing an execution model that can unify all computing resources is still very difficult. Moreover, scheduling tasks to run efficiently on heterogeneous architectures still needs a lot of research. Existing solutions tend to focus on individual architectures or deal with heterogeneity among CPUs and GPUs only, but in reality, often, heterogeneous systems exist. Up to now very cumbersome, manual adaption is required to take advantage of these heterogeneous architectures. the aim of this paper is to provide a proposal for a functional-level design of a multiagent-based framework to deal withthe heterogeneity of hardware architectures and parallel computing paradigms deployed to solve those problems. Bioinformatics will be selected as a case study.
Multiple threads running on a multi-core processor can improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays a key role to achieve optimal performance b...
详细信息
ISBN:
(纸本)9781479938445
Multiple threads running on a multi-core processor can improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays a key role to achieve optimal performance because performance does not necessarily improve with increasing number of cores. Multi-threaded applications suffer due to thread synchronization, negative interference in shared memory including last level cache and main memory. Memory bandwidth also often limits the performance of a multi-threaded workload. In this paper we propose a method to achieve optimal scalability on multi-core platform and predict the bandwidth requirement of parallel workloads for a given number of threads. We employ the proposed method to improve the performance of bandwidth limited parallel applications. We find that DRAM access has various phases and use the highest bandwidth among all phases to predict the performance of a given workload on multi-threaded environment. We evaluate our proposed method using Gem5 multi-core simulator and the experimental results show that the phase based bandwidth utilization method can estimate the optimal number of threads for a given parallel workload and has low prediction error.
processing complex queries on unbounded event streams in real-time, is a challenge for many data processing systems. these systems are expected to process data with reduced latency to generate real-time events, and at...
详细信息
ISBN:
(纸本)9781450332866
processing complex queries on unbounded event streams in real-time, is a challenge for many data processing systems. these systems are expected to process data with reduced latency to generate real-time events, and at high throughput to minimize the required hardware. In this regard, Grand Challenge 2015 [6] focuses on evaluating two queries (frequent routes and profitable cells) in real-time with low latency and high throughput. these queries involve processing windows of thousands of records. Firstly, such processing demands efficient data structures and algorithms to minimize the processing overhead. Secondly, the system should partition data to evaluate them in parallel to make it *** this paper, we present a set of data structures that we designed to evaluate the aforementioned queries with O(log n) time complexity and a data partitioning technique to evaluate them in parallel. We then evaluate our solution on a single machine as well as in a distributed setting in a commodity cluster of machines over a 1Gbps LAN. We were able to process the frequent routes query withthe 173 million trips dataset within 5 minutes with less than 4 millisecond latency and the profitable cells query with same dataset within 11 minutes with less than 5 millisecond latency.
this paper explains modeling and control of temperature dynamics on induction furnace. Induction furnace is used for melting metal to process raw material scrap into steel ferrit rate 0.22 wt% carbons. the dynamics re...
详细信息
ISBN:
(纸本)9781467367141
this paper explains modeling and control of temperature dynamics on induction furnace. Induction furnace is used for melting metal to process raw material scrap into steel ferrit rate 0.22 wt% carbons. the dynamics response of the induction furnace temperature affects the resulting product. therefore, the controller of temperature dynamics is required to produce the desired process response. the induction furnace systems consist of electrical and thermal system dynamics. the dynamics of the electrical system represents the induction furnace system in the form of an electric circuit i.e. fed current inverter with a parallel resonant circuit. Meanwhile, the thermal system dynamics represents the thermal energy transfer process, which is developed withthe principle of energy balance, including heat generated energy and heat loss. Induction furnace system dynamics is modelled in an order 2 system, with a time constant coil 1000 times faster than time constant temperature. thus, by ignoring the time constant coil, induction furnace system dynamic model can be transformed into a first order nonlinear system. then, linear system can be obtained by making a replacement variable. Induction furnace temperature control has been implemented by adjusting the PWM to control the input power to the induction furnace. PI controller is designed in three cases, namely linear model with linear PI controller, saturated linear model with linear PI controller, and saturated linear model with anti-windup PI controller. Each case is simulated by Simulink. To get the appropriate specifications, the temperature should be controlled at 912 Celsius degree. the best results can be achieved with maximum overshoot is 7% and the rise time is 2.9 seconds.
暂无评论