Compared with tradition disk, NAND Flash has advantages of higher performance and shock resistance. But before write, NAND Flash must erase the old messages. that why NAND Flash based Solid State Disks (SSDs) always u...
详细信息
We develop a simple hierarchical model for the performance analysis of compute clusters assembled from multi-core compute nodes connected by a (high-speed) network. the performance is described by the dimensionless sp...
详细信息
We develop a simple hierarchical model for the performance analysis of compute clusters assembled from multi-core compute nodes connected by a (high-speed) network. the performance is described by the dimensionless speed-up and efficiency in dependence on important hardware and application parameters. the hardware parameters are the number of compute nodes and the bandwidththe network, together withthe number of cores per node, the theoretical performance of each core and the bandwidth of the main memory. the application parameters are the total number of operations performed on a number of bytes and the total number of bytes communicated between the processing units. In order to exemplify our concept we apply it to the scalar product of vectors, matrix multiplication, Linpack and FFT. Our previous performance models are contained as special cases in the new more comprehensive approach.
the current development of high performance parallel supercomputing infrastructures are pushing the boundaries of applications of science and are bringing new paradigms into engineering practices and simulations. Eart...
详细信息
the current development of high performance parallel supercomputing infrastructures are pushing the boundaries of applications of science and are bringing new paradigms into engineering practices and simulations. Earthquake engineering is also one of the major fields, which benefits from above by looking for solutions in grid computing and cloud computing techniques. Generally, earthquake simulations involve analysis of petabytes of data. Analyzing these large amounts of data in parallel in thousands of nodes in computer clusters results in gaining high performances. Open source cloud solutions such as Hadoop MapReduce, which is highly scalable and capable of processing large amount of data rapidly in parallel on large clusters provide better solution compared to RDBDM. Both GPUs and MapReduce are designed to support vast data parallelism. For performance considerations, GPU computing could be adopted over low performing CPU systems. this paper discusses MapReduce system using Hadoop and Mars. Mars is a MapReduce framework on graphics processor. Hence, the proposition is to use GPU based systems for earthquake simulations in which Digital elevation model 3D data sets are fully materialized where scientist can make use of these data for various analysis and simulations.
Multicore architectures enable increasing the performance of the system withparallelprocessing. One of the challenges of a multicore embedded system is the correct usage of the processor cores. It is possible to ach...
详细信息
Multicore architectures enable increasing the performance of the system withparallelprocessing. One of the challenges of a multicore embedded system is the correct usage of the processor cores. It is possible to achieve balanced processor load on the different cores, but the communication bandwidth between the cores is often a bottleneck. Passing large amounts of data between tasks mapped to different processor cores can result in cache misses in the local cache of a processor core. this paper introduces an analyzation method based on runtime generated data flow graphs to find the data paths of an algorithm. It shows that a spectral cluster analysis can help to discover data independent subsets in the algorithm under test. Finding the data independent parts helps to partition the program to multiple slices where the inter-slice communication is kept as low as possible. With our proposed method the communication bottleneck can be evaded in a multicore, multitask implementation, possibly resulting in better performance.
In China, the expressway isn’t free. When a vehicle exits, the exit toll station needs to calculate the toll according to the vehicle trajectory obtained by sending a trajectory query task to the trajectory center re...
详细信息
Every single communication on the Internet reveals private and sensitive information of the communicating parties if no further measures are applied. Various applications and measures are already available to e.g. tun...
详细信息
Every single communication on the Internet reveals private and sensitive information of the communicating parties if no further measures are applied. Various applications and measures are already available to e.g. tunnel traffic through other nodes to obscure the original sender and receiver. Existing frameworks require external applications, running on the particular nodes. We propose a flexible architecture for an anonymous communication framework that supports the interoperability among different platforms. Our proof-of-concept implementation, based on web standards and web technologies shows the feasibility of the framework in terms of usability and interoperability. the framework is running completely in the web-browser and does not have requirements on external applications. the evaluation results show that our framework brings great benefits to user's privacy and security.
In this paper, we introduce an efficient method to accelerate flow simulations for an isothermal multiphase and multicomponent (MPMC) Lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. Our objecti...
详细信息
In this paper, we introduce an efficient method to accelerate flow simulations for an isothermal multiphase and multicomponent (MPMC) Lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. Our objective is to propose an efficient way to improve performance of multiphase and multicomponent Lattice Boltzmann simulations by the use of Nvidia GPUDirect technology and Peer-to-Peer (P2P) data transfers. Optimization of Peer-to-Peer communications is also studied in this work by the use of a clustering algorithm. Several simulations are shown and performance is discussed in order to validate the method.
We study the traffic characteristics of parallel and high performance computing applications in this paper. Applications that utilize multiple cores are more and more common nowadays due to the emergence of multicore ...
详细信息
ISBN:
(纸本)9781467385312
We study the traffic characteristics of parallel and high performance computing applications in this paper. Applications that utilize multiple cores are more and more common nowadays due to the emergence of multicore processors. However the design nature of single-threaded applications and multi-threaded applications can vary significantly. Furthermore the on-chip communication profile of multicore systems should be analysed and modelled for characterization and simulation purposes. We investigate several applications running on a full system simulation environment. the on-chip communication traces are gathered and analysed. We study the detailed low-level profiles of these applications. the applications are categorized into different groups according to various parallel programming paradigms. We discover that the trace data follow different parameters of power-law model. the problem is solved by applying least-squares linear regression. We propose a generic synthetic traffic model based on the analysis results.
In finite element optimization, the computational load limits the size of problem that can be solved. Finite element computation involves the solution of large matrix equations. then optimization requires several such...
详细信息
In finite element optimization, the computational load limits the size of problem that can be solved. Finite element computation involves the solution of large matrix equations. then optimization requires several such equations to be solved. parallelization has been the preferred route to overcome this problem but was again limited by the cost of computers and the number of processors available. the graphics processing unit (GPU) on a PC provides a means of implementing the massive computations on numerous parallelthreads cheaply on PCs. the purpose of this is to review finite element matrix equation solution on the GPU and point out areas where further investigation is warranted. Our intention is to direct computational research and computer architecture development so that we may use the GPU better for more effective computational parallelization in finite element field computation.
Deep neural network (DNN) is becoming more and more applied in data center applications such as speech recognition, image search, etc. However, the training in DNN is very time-consuming because of its deep structure....
详细信息
ISBN:
(纸本)9781467394741
Deep neural network (DNN) is becoming more and more applied in data center applications such as speech recognition, image search, etc. However, the training in DNN is very time-consuming because of its deep structure. this paper presents FPGA-based acceleration of deep neural networks using a high level method and proposes a parallel optimizing strategy using the Kintex-7 FPGA board's features. Experimental results show that it can increase the utilization of FPGA computation units with low mini-batch size and reduce the transfer cost effectively. the optimized algorithm achieves up to 17.65x higher performance than CPU.
暂无评论