The independent set ordering algorithm is a heuristic algorithm based on finding maximal independent sets of vertices in the matrix adjacency graph, which is commonly used for parallel matrix factorization. However, D...
详细信息
Program performance optimization often involves choosing right parameters to minimize the program's runtime. Selecting optimization parameters by means of execution-driven search is guaranteed to find excellent re...
详细信息
Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorith...
详细信息
Multi-island single electron transistor is an important kind of the single electron transistor, which is convenient to realize the controllable room temperature operation. A novel semi-empirical compact model for the ...
详细信息
ISBN:
(纸本)9781424435449
Multi-island single electron transistor is an important kind of the single electron transistor, which is convenient to realize the controllable room temperature operation. A novel semi-empirical compact model for the Multi-island single electron transistor is proposed. The new approach combines the orthodox theory of the single electron tunneling through single coulomb island and a novel empirical analysis procedure for the chain of multi coulomb islands to solve the current of the whole multi-island single electron transistor. The tunneling rates are calculated based on the orthodox theory for the single electron tunneling. The tunneling currents representing the first splitted peaks in the coulomb oscillation curves are calculated according to the assumption that the currents through all the coulomb islands are equal to each other at the stable states, while the currents representing the other splitted peaks are constructed and merged together according to the empirical analysis. The model is verified by the traditional SET simulator SIMON and shows much faster calculation speed than SIMON. Therefore, the novel compact model is suitable for the large scale MISET circuit simulation.
In recent years, the problem of lake eutrophication has become increasingly severe. The monitoring and control of cyanobacteria in lakes are of great significance. The information obtained by existing monitoring metho...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the Hyperbolic space to model t...
详细信息
Proximity ranking according to end-to-end network distances (e.g., Round-Trip Time, RTT) can reveal detailed proximity information, which is important in network management and performance diagnosis in distributed sys...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ...
详细信息
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.
As an infrastructure for data distribution, overlay networks have to feature efficient routing and adequate robustness to achieve fast and accurate data distribution in the environment with node churn. Considering tha...
详细信息
暂无评论