The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of t...
详细信息
As the big data era is coming,it brings new challenges to the massive data processing.A combination of GPU and CPU on chip is the trend to release the pressure of large scale *** found that there are different memory ...
详细信息
As the big data era is coming,it brings new challenges to the massive data processing.A combination of GPU and CPU on chip is the trend to release the pressure of large scale *** found that there are different memory access characteristics between GPU and *** most important one is that the programs of GPU include a large number of threads,which lead to higher access frequency in cache than the CPU *** the LRU policy favors the programs with high memory access frequency,the programs of GPU can't get the corresponding performance boost even more cache resources are *** LRU policy is not suitable for heterogeneous multi-core *** on the different characteristics of GPU and CPU programs on memory access,this paper proposes an LLC dynamic replacement policy--DIPP(Dynamic Insertion/Promotion Policy) for heterogeneous multi-core *** core idea of the replacement policy is to reduce the miss rate of the program and enhance the overall system performance by limiting the cache resources that GPU can acquire and reducing the thread interferences between *** compare the DIPP replacement policy with LRU and we conduct a classified discussion according to the program results of *** programs enhance 23.29% on the average performance(using arithmetic mean).Large working sets programs can improve 13.95%,compute-intensive programs enhance 9.66% and stream class programs improve 3.8%.
Simulation of particle transport is critical for a great many of scientific and engineering domains. The Monte Carlo (MC) method is one of the most important numerical methods for the simulation of particle transport,...
详细信息
ISBN:
(纸本)9781467368513
Simulation of particle transport is critical for a great many of scientific and engineering domains. The Monte Carlo (MC) method is one of the most important numerical methods for the simulation of particle transport, and can simulate many complex types of particle transport. But the computation requirement of the MC simulation is very large. In 2010, Intel announced the Intel Many Integrated Core (MIC) architecture, which consists of many simple general-purpose cores and supports the well-known shared-memory execution model that is the base of most nodes in HPC machines. On account of the independence of simulation of each particle in the MC method, it is well-suited to accelerate the MC simulation on MIC. In this paper, an algorithm named MCNP-MIC based MIC is presented for MC simulation of neutron transport in the context of deep penetration problem, which includes the development of parallel random generator, the assignment of particle number based thread number and the design of high efficiency data structures for parallelism. Eventually, we get the results as follows: with the same problem scale and computational accuracy, the MCNPMIC algorithm has achieved roughly 5.6-fold speedup running on a 57-core MIC chip in comparison with the serial MCNP algorithm on an Intel Xeon E5-2670 CPU.
Data races hidden in concurrent programs have caused severe failures. To improve the reliability, many race detectors are proposed. However, most of the reported races are not harmful, which consumes manual effort to ...
详细信息
ISBN:
(纸本)9781479984923
Data races hidden in concurrent programs have caused severe failures. To improve the reliability, many race detectors are proposed. However, most of the reported races are not harmful, which consumes manual effort to identify the harmful races. This paper proposes RaceChecker that can detect the potential races and identify the harmful races effectively and efficiently. Unlike previous detectors, RaceChecker combines happens-before relation and ad-hoc synchronization to prune the infeasible races so that fewer potential races are required to be verified. Before verification, RaceChecker groups the remaining potential races, guaranteeing the potential races in one group do not interfere with each other. Therefore, multiple potential races in one group can be verified together in one execution. To our knowledge, this is the first effective technique that groups the potential races to improve the efficiency. Unlike previous detectors that verify one potential race in one execution, RaceChecker dynamically controls thread scheduler to create real race conditions to verify multiple potential races in one execution, identifying the harmful races that cause program failures. We have implemented RaceChecker as a prototype tool and have experimented on a number of real-world concurrent programs. Results show that 66% of the potential races are infeasible and nearly 48% of the executions are reduced by the grouping strategy. The known harmful races are also identified effectively. By pruning and grouping, RaceChecker identifies the harmful races more efficiently. Comparing with RaceMob and RaceFuzzer, the time is reduced significantly, with an average of 45% and 81% respectively.
Stragglers can temporize jobs and reduce cluster efficiency *** researches have been contributed to the solution,such as Blacklist[8],speculative execution[1,6],Dolly[8].In this paper,we put forward a new approach for...
详细信息
Stragglers can temporize jobs and reduce cluster efficiency *** researches have been contributed to the solution,such as Blacklist[8],speculative execution[1,6],Dolly[8].In this paper,we put forward a new approach for mitigating stragglers in Map Reduce,name *** starts task clones only for high-risk delaying *** experiments have been carried and results show that it can decrease the job delaying risk with fewer resources *** small jobs,Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly.
Due to the uncertainty and unpredictability of environment changes, it is a great challenge to develop self-adaptive systems in open environment. First, it is difficult for developers to clearly predict various enviro...
详细信息
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this...
详细信息
ISBN:
(纸本)9781467365994
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suf...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to ...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.
暂无评论