As the fourth passive circuit component, a memristor is a nonlinear resistor that can "remember" the amount of charge passing through it. The characteristic of "remembering" the charge and non-volatility makes mem...
详细信息
As the fourth passive circuit component, a memristor is a nonlinear resistor that can "remember" the amount of charge passing through it. The characteristic of "remembering" the charge and non-volatility makes memristors great potential candidates in many fields. Nowadays, only a few groups have the ability to fabricate memristors, and most researchers study them by theoretic analysis and simulation. In this paper, we first analyse the theoretical base and characteristics of memristors, then use a simulation program with integrated circuit emphasis as our tool to simulate the theoretical model of memristors and change the parameters in the model to see the influence of each parameter on the characteristics. Our work supplies researchers engaged in memristor-based circuits with advice on how to choose the proper parameters.
With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor...
详细信息
With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor's potential to store multiple bits of information as different resistance levels allows its application in multilevel cell (MCL) tech- nology, which can significantly increase the memory capacity. However, most existing memristor models are built for binary or continuous memristance switching. In this paper, we propose the simulation program with integrated circuits emphasis (SPICE) modeling of charge-controlled and flux-controlled memristors with multilevel resistance states based on the memristance versus state map. In our model, the memristance switches abruptly between neighboring resistance states. The proposed model allows users to easily set the number of the resistance levels as parameters, and provides the predictability of resistance switching time if the input current/voltage waveform is given. The functionality of our models has been validated in HSPICE. The models can be used in multilevel RRAM modeling as well as in artificial neural network simulations.
Extraordinary large datasets of high performance computing applications require improvement in existing storage and retrieval mechanisms. Moreover, enlargement of the gap between data processing and I/O operations'...
详细信息
ISBN:
(纸本)9781479915194
Extraordinary large datasets of high performance computing applications require improvement in existing storage and retrieval mechanisms. Moreover, enlargement of the gap between data processing and I/O operations' throughput will bound the system performance to storage and retrieval operations and remarkably reduce the overall performance of high performance computing clusters. File replication is a way to improve the performance of I/O operations and increase network utilization by storing several copies of every file. Furthermore, this will lead to a more reliable and fault-tolerant storage cluster. In order to improve the response time of I/O operations, we have proposed a mechanism that estimates the required number of replicas for each file based on its popularity. Besides that, the remaining space of storage cluster is considered in the evaluation of replication factors and the number of replicas is adapted to the storage state. We have implemented the proposed mechanism using HDFS and evaluated it using MapReduce framework. Evaluation results prove its capability to improve the response time of read operations and increase network utilization. Consequently, this mechanism reduces the overall response time of read operations by considering files' popularity in replication process and adapts the replication factor to the cluster state.
High-performance computing (HPC) clusters are currently faced with two major challenges - namely, the dynamic nature of new generation of applications and the heterogeneity of platforms - if they are going to be usefu...
详细信息
High-performance computing (HPC) clusters are currently faced with two major challenges - namely, the dynamic nature of new generation of applications and the heterogeneity of platforms - if they are going to be useful for exascale computing. Processes running these applications may well demand unpredictable requirements and changes to system configuration and capabilities at runtime, thereby requiring fast system response without sacrificing the transparency and integrity of the reconfigured empowered system that is running on a heterogeneous platform. While a challenge in and of itself, platform heterogeneity is both useful and instrumental in the handling of unpredictable requests. The realization of such a dynamically reconfigurable and heterogeneous HPC cluster system for exascale computing requires a model to guide running processes to determine if they need empowerment of the current cluster, and if yes, by how much. To show the feasibility of empowerment of traditional HPC clusters for exascale computing, we have selected Beowulf as a noble candidate cluster and present a mathematical model for the empowerment of Beowulf clusters for exascale computing (EBEC). We have developed the model in line with Beowulf's cluster approach and by using vector space algebra. In contrast to traditional hardware-oriented approaches to improvise the performance of clusters, we use a software approach to the development of the proposed model by emphasizing processes, which act as the creators of the cluster and thus should decide on system (re)configuration, as the principal building blocks of the system. We have also adopted a new approach to heterogeneity by considering heterogeneity at different levels including hardware, system software, application software, and system functionality. In addition to support for heterogeneity and dynamic reconfiguration, the proposed model includes support for scalability that is crucial to exascale computing too.
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer...
详细信息
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC)...
详细信息
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big *** OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific *** effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the *** have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing *** experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.
Low power is the first-class design requirement for HPC systems. Dynamic voltage and frequency scaling (DVFS) has become the commonly used and efficient technology to achieve a trade-off between power consumption and ...
详细信息
Low power is the first-class design requirement for HPC systems. Dynamic voltage and frequency scaling (DVFS) has become the commonly used and efficient technology to achieve a trade-off between power consumption and system performance. However, most the prior work using DVFS did not take into account the latency of voltage/frequency scaling, which is a critical factor in real hardware determining the power efficiency of the power management algorithm. This paper, firstly, investigate the latency features of DVFS on a real many-core hardware platform. Secondly, we propose a latency-aware DVFS algorithm for profile-based power management to avoid aggressive power state transitions. At last, we evaluate our algorithm on Intel SCC platform using a data-intensive benchmark, Graph 500 benchmark. The experimental results not only show impressive potential for energy saving in data-intensive applications (up to 31% energy saving and 60% EDP reduction), but also evaluate the efficiency of our latency-aware DVFS algorithm which achieves 12.0% extra energy saving and 5.0% extra EDP reduction, moreover, increases the execution performance by 22.4%.
Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the conv...
详细信息
Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the convergence of iteration sequence used in the algorithm and further, it is hard to estimate the rate of convergence even if the convergence is satisfied. Second, the running time cost is increased intolerablely in drawing largescale graphs, and therefore the advantages of the force-directed approach are limited in practice. This paper is focused on these problems and presents a sufficient condition for ensuring the convergence of iterations. We then develop a practical heuristic algorithm for speeding up the iteration in force-directed approach using a successive over-relaxation (SOR) strategy. The results of computational tests on the several benchmark graph datasets used widely in graph drawing research show that our algorithm can dramatically improve the performance of force-directed approach by decreasing both the number of iterations and running time, and is 1.5 times faster than the latter on average.
Recently, GPU has been widely used in High Performance Computing (HPC). In order to improve computational performance, several GPUs are integrated into one computer node in practical system. However, power consumption...
详细信息
ISBN:
(纸本)9783642283079;9783642283086
Recently, GPU has been widely used in High Performance Computing (HPC). In order to improve computational performance, several GPUs are integrated into one computer node in practical system. However, power consumption of GPUs is very high and becomes as bottleneck to its further development. In doing so, optimizing power consumption have been draw broad attention in the research area and industry community. In this paper, we present an energy optimization model considering performance constraint for homogeneous multi-GPUs, and propose a performance prediction model when task partitioning policy is specified. Experiment results validate that the model can accurately predict the execution of program for single or multiple GPUs, and thus reduce static power consumption by the guide of task partition.
Symbolic execution is an effective path oriented and constraint based program analysis technique. Recently, there is a significant development in the research and application of symbolic execution. However, symbolic e...
详细信息
暂无评论