Library functions and system calls have been major difficulties faced by automatic test. Input/ output (I/O) functions are a set of common library functions. Testers have to interact with the test procedures if the te...
详细信息
computersystems in the near future are expected to have Non- Volatile Main Memory (NVMM), enabled by a new generation of Non-Volatile Memory (NVM) technologies, such as Phase Change Memory (PCM), STT-MRAM, and Memris...
详细信息
Hot data is very important for optimizing modern computersystems. For example, the identified hot data can be employed to extend the lifespan of flash memory. However, it is very challenging to effectively identify h...
详细信息
Kinetic Monte Carlo(KMC) algorithm has been widely applied for simulation of radiation damage, grain growth and chemical reactions. To simulate at a large temporal and spatial scale, domain decomposition is commonly u...
详细信息
Kinetic Monte Carlo(KMC) algorithm has been widely applied for simulation of radiation damage, grain growth and chemical reactions. To simulate at a large temporal and spatial scale, domain decomposition is commonly used to parallelize the KMC algorithm. However, through experimental analysis, we find that the communication overhead is the main bottleneck which affects the overall performance and limits the scalability of parallel KMC algorithm on large-scale clusters. To alleviate the above problems, we present a communication aggrega‐tion approach to reduce the total number of messages and eliminate the commu‐nication redundancy, and further utilize neighborhood collective operations to optimize the communication scheduling. Experimental results show that the opti‐mized KMC algorithm exhibits better performance and scalability compared with the well-known open-source library—SPPARKS. On 32-node Xeon E5-2680 cluster(total 640 cores), the optimized algorithm reduces the total execution time by 16 %, reduces the communication time by 50 % on average, and achieves 24 times speedup over the single node(20 cores) execution.
Knowledge of the queue length for a radio link in a mobile data network has a significant effect on the performance of the communication protocol TCP. If the queue length can be accurately estimated and regulated to a...
详细信息
In-memory graph computation systems have been used to support many important applications, such as PageRank on the web graph and social network analysis. In this paper, we study the CPU cache performance of graph comp...
详细信息
ISBN:
(纸本)9781479984435
In-memory graph computation systems have been used to support many important applications, such as PageRank on the web graph and social network analysis. In this paper, we study the CPU cache performance of graph computation. We have implemented a graph computation system, called GraphLite, in C/C++ based on the description of Pregel. We analyze the CPU cache behavior of the internal data structures and operations of graph computation. Then we exploit CPU cache prefetching techniques to improve the cache performance. Real machine experimental results show that our solution achieves 1.9-2.2x speedups compared to the baseline implementation.
This paper proposes a multi-objective with dynamic topology particle swarm optimization (PSO) algorithm for solving multi-objective problems, named DTPSO. One of the main drawbacks of classical multi-objective particl...
详细信息
With the increasing diversity of application needs and computing units, the server with heterogeneous pro- cessors is more and more widespread. However, conventional SMP/ccNUMA server architecture introduces communica...
详细信息
With the increasing diversity of application needs and computing units, the server with heterogeneous pro- cessors is more and more widespread. However, conventional SMP/ccNUMA server architecture introduces communication bottleneck between heterogeneous processors and only uses heterogeneous processors as coprocessors, which limits the efficiency and flexibility of using heterogeneous processors. To solve this problem, this paper proposes an intra-server inter- connect fabric that supports both intra-server peer-to-peer interconnection and I/O resource sharing among heterogeneous processors. By connecting processors and I/O devices with the proposed fabric, heterogeneous processors can perform direct communication with each other and run in stand-alone mode with shared intra-server resources. We design the proposed fabric by extending the de-facto system I/O bus protocol PCIe (Peripheral computer Interconnect Express) and implement it with a single chip cZodiac. By making full use of PCIe's original advantages, the interconnection and the I/O sharing mechanism are light weight and efficient. Evaluations that have been carried out on both the FPGA (Field Programmable Gate Array) prototype and the cycle-accurate simulator demonstrate that our design is feasible and scalable. In addition, our design is suitable for not only the heterogeneous server but also the high density server.
Machine-learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especia...
详细信息
In this paper, a new method is proposed to evaluate the performance of concurrent systems. A concurrent system consisting of multiple processes that communicate via message passing mechanisms is modeled by a Petri net...
详细信息
暂无评论