In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC)...
详细信息
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big *** OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific *** effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the *** have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing *** experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ...
详细信息
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.
In diverse and self-governed multiple clouds context, the service management and discovery are greatly challenged by the dynamic and evolving features of services. How to manage the features of cloud services and supp...
详细信息
In diverse and self-governed multiple clouds context, the service management and discovery are greatly challenged by the dynamic and evolving features of services. How to manage the features of cloud services and support accurate and efficient service discovery has becomean open problem in the area of cloud computing. This paper proposes a field model of multiple cloud services and corresponding service discovery method to address the issue. Different from existing researches, our approach is inspired by Bohr atom model. We use the abstraction of energy level and jumping mechanism to describe services status and variations, and thereby to support the service demarcation and discovery. The contributions of this paper are threefold. First, we propose the abstraction of service energy level to represent the status of services, and service jumping mechanism to investigate the dynamic and evolving features as the variations and re-demarcation of cloud services according to their energy levels. Second, we present user acceptable service region to describe the services satisfying users' requests and corresponding service discovery method, which can significantly decrease services search scope and improve the speed and precision of service discovery. Third, a series of algorithms are designed to implement the generation of field model, user acceptable service regions, service jumping mechanism, and user-oriented service discovery. We have conducted an extensive experiments on QWS dataset to validate and evaluate our proposed models and algorithms. The results show that field model can well support the representation of dynamic and evolving aspects of services in multiple clouds context and the algorithms can improve the accuracy and efficiency of service discovery.
Dear editor,Application programming interface (API) documentation plays an important role in software development and reuse [1] for both API maintainers and API *** documentation helps developers understand and reuse ...
详细信息
Dear editor,Application programming interface (API) documentation plays an important role in software development and reuse [1] for both API maintainers and API *** documentation helps developers understand and reuse codes effectively [2] and focus their time on desired interfaces and functions instead of the entire system [3].Most high-quality open source projects maintain complete and informative official *** documentation typically conveys detailed specifications,such as class/inter face hierarchies and method descriptions,which can be of great benefit to developers [4].However,despite its authoritativeness and thoroughness,single-sourced official documentation does not always meet the developers'requirements [5].
Instance-specific algorithm selection technologies have been successfully used in many research fields,such as constraint satisfaction and planning. Researchers have been increasingly trying to model the potential rel...
详细信息
Instance-specific algorithm selection technologies have been successfully used in many research fields,such as constraint satisfaction and planning. Researchers have been increasingly trying to model the potential relations between different candidate algorithms for the algorithm selection. In this study, we propose an instancespecific algorithm selection method based on multi-output learning, which can manage these relations more *** kinds of multi-output learning methods are used to predict the performances of the candidate algorithms:(1)multi-output regressor stacking;(2) multi-output extremely randomized trees; and(3) hybrid single-output and multioutput trees. The experimental results obtained using 11 SAT datasets and 5 Max SAT datasets indicate that our proposed methods can obtain a better performance over the state-of-the-art algorithm selection methods.
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to ...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.
With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor...
详细信息
With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor's potential to store multiple bits of information as different resistance levels allows its application in multilevel cell (MCL) tech- nology, which can significantly increase the memory capacity. However, most existing memristor models are built for binary or continuous memristance switching. In this paper, we propose the simulation program with integrated circuits emphasis (SPICE) modeling of charge-controlled and flux-controlled memristors with multilevel resistance states based on the memristance versus state map. In our model, the memristance switches abruptly between neighboring resistance states. The proposed model allows users to easily set the number of the resistance levels as parameters, and provides the predictability of resistance switching time if the input current/voltage waveform is given. The functionality of our models has been validated in HSPICE. The models can be used in multilevel RRAM modeling as well as in artificial neural network simulations.
Many real-world networks are found to be scale-free. However, graph partition technology, as a technology capable of parallel computing, performs poorly when scale-free graphs are provided. The reason for this is that...
详细信息
Many real-world networks are found to be scale-free. However, graph partition technology, as a technology capable of parallel computing, performs poorly when scale-free graphs are provided. The reason for this is that traditional partitioning algorithms are designed for random networks and regular networks, rather than for scale-free networks. Multilevel graph-partitioning algorithms are currently considered to be the state of the art and are used extensively. In this paper, we analyse the reasons why traditional multilevel graph-partitioning algorithms perform poorly and present a new multilevel graph-partitioning paradigm, top down partitioning, which derives its name from the comparison with the traditional bottom-up partitioning. A new multilevel partitioning algorithm, named betweenness-based partitioning algorithm, is also presented as an implementation of top-down partitioning paradigm. An experimental evaluation of seven different real-world scale-free networks shows that the betweenness-based partitioning algorithm significantly outperforms the existing state-of-the-art approaches.
Charge sharing is becoming an important topic as the feature size scales down in fin field-effect-transistor (FinFET) technology. However, the studies of charge sharing induced single-event transient (SET) pulse q...
详细信息
Charge sharing is becoming an important topic as the feature size scales down in fin field-effect-transistor (FinFET) technology. However, the studies of charge sharing induced single-event transient (SET) pulse quenching with bulk FinFET are reported seldomly. Using three-dimensional technology computer aided design (3DTCAD) mixed-mode simulations, the effects of supply voltage and body-biasing on SET pulse quenching are investigated for the first time in bulk FinFET process. Research results indicate that due to an enhanced charge sharing effect, the propagating SET pulse width decreases with reducing supply voltage. Moreover, compared with reverse body-biasing (RBB), the circuit with forward body-biasing (FBB) is vulnerable to charge sharing and can effectively mitigate the propagating SET pulse width up to 53% at least. This can provide guidance for radiation-hardened bulk FinFET technology especially in low power and high performance applications.
This paper presents a new approach to generating configuration-oriented executable symbolic test sequences from Extended Finite State Machine (EFSM) models. The information about the values of the context variables an...
详细信息
暂无评论