Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the outpu...
详细信息
Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either n...
详细信息
Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either not powerful or efficient enough, or too complex and expensive to deploy and maintain. In this paper, we present MTracer, which is a lightweight trace-oriented monitoring system for medium-scale distributed systems. We have proposed and implemented several optimizations to improve the efficiency of the monitor server in MTracer. A web-based frontend is also provided to visualize a monitored system from different perspectives. We have validated MTracer in a real medium-scale environment. The results indicate that MTracer has a very lower overhead, and can handle more than 4000 events per second.
Resource allocation for multi-user across multiple data centers is an important problem in cloud computing environments. Many geographically-distributed users may request virtualized resources simultaneously. And the ...
详细信息
The Embarrassingly parallel(EP) algorithm which is typical of many Monte Carloapplications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Int...
详细信息
The Embarrassingly parallel(EP) algorithm which is typical of many Monte Carloapplications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Intel released Many Integrated Core(MIC) architecture as a many-core co-processor. MIC often offers more than 50 cores each of which can run four hardware threads as well as 512-bit vector instructions. In this paper,we describe how the EP algorithm is accelerated effectively on the platforms containing MIC using the offload execution model. The result shows that the efficientimplementation of EP algorithm on MIC can take full advantage of MIC's computational resources and achieves a speedup of 3.06 compared with that on Intel Xeon E5-2670 CPU. Based on the EP algorithm on MIC and an effective task distribution model, the implementation of EP algorithm on a CPU-MIC heterogeneous platform achieves the performance of up to2134.86 Mop/s and 4.04 times speedup compared with that on Intel Xeon E5-2670 CPU.
With the development of high performance computing and Web 2.0 applications,unstructured data storage becomes more and more *** RDBMS isn't efficient for big data ***,RDBMS's scalability is ***' expansion ...
详细信息
With the development of high performance computing and Web 2.0 applications,unstructured data storage becomes more and more *** RDBMS isn't efficient for big data ***,RDBMS's scalability is ***' expansion often leads to a large scale of data *** paper designs and implements a high performance distributed key-value database,which is distributed Stage *** servers are organized by a consistent hashing ring and distributed with the support of Zookeeper,a distributed service *** has a high single-node read/write *** route information is calculated by clients,which reduces the expense of expansion.
The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrievin...
详细信息
In smart grid, privacy implications to individuals and their family is an important issue, due to the fine-grained usage data collection. Wireless communications are considered by many utility companies to obtain info...
详细信息
The double-precision matrix-matrix multiplication (DGEMM) on ARMv8 64-bit multi-core processor architecture was realized and optimized, and the optimal model for the purpose of maximizing the compute-to-memory access ...
详细信息
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture h...
详细信息
ISBN:
(纸本)9781479944156
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies to meet various kinds of data demands on speed and capacity. Simulation shows that our implementation is practical and it gets 76% improvement on throughput over the latest GPU implementation. The result demonstrates that the newest Kepler architecture is suitable for turbo decoding and it can be a promising reconfigurable platform for the communication system.
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
暂无评论