With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor...
详细信息
With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory (RRAM). Memristor's potential to store multiple bits of information as different resistance levels allows its application in multilevel cell (MCL) tech- nology, which can significantly increase the memory capacity. However, most existing memristor models are built for binary or continuous memristance switching. In this paper, we propose the simulation program with integrated circuits emphasis (SPICE) modeling of charge-controlled and flux-controlled memristors with multilevel resistance states based on the memristance versus state map. In our model, the memristance switches abruptly between neighboring resistance states. The proposed model allows users to easily set the number of the resistance levels as parameters, and provides the predictability of resistance switching time if the input current/voltage waveform is given. The functionality of our models has been validated in HSPICE. The models can be used in multilevel RRAM modeling as well as in artificial neural network simulations.
We consider a wide range of non-convex regularized minimization problems, where the non-convex regularization term is composite with a linear function engaged in sparse learning. Recent theoretical investigations have...
详细信息
We consider a wide range of non-convex regularized minimization problems, where the non-convex regularization term is composite with a linear function engaged in sparse learning. Recent theoretical investigations have demonstrated their superiority over their convex counterparts. The computational challenge lies in the fact that the proximal mapping associated with non-convex regularization is not easily obtained due to the imposed linear composition. Fortunately, the problem structure allows one to introduce an auxiliary variable and reformulate it as an optimization problem with linear constraints, which can be solved using the Linearized Alternating Direction Method of Multipliers (LADMM). Despite the success of LADMM in practice, it remains unknown whether LADMM is convergent in solving such non-convex compositely regularized optimizations. In this research, we first present a detailed convergence analysis of the LADMM algorithm for solving a non-convex compositely regularized optimization problem with a large class of non-convex penalties. Furthermore, we propose an Adaptive LADMM (AdaLADMM) algorithm with a line-search criterion. Experimental results on different genres of datasets validate the efficacy of the proposed algorithm.
Edge extraction is an indispensable task in digital image processing. With the sharp increase in the image data, real-time problem has become a limitation of the state of the art of edge extraction *** this paper, QSo...
详细信息
Edge extraction is an indispensable task in digital image processing. With the sharp increase in the image data, real-time problem has become a limitation of the state of the art of edge extraction *** this paper, QSobel, a novel quantum image edge extraction algorithm is designed based on the flexible representation of quantum image(FRQI) and the famous edge extraction algorithm Sobel. Because FRQI utilizes the superposition state of qubit sequence to store all the pixels of an image, QSobel can calculate the Sobel gradients of the image intensity of all the pixels simultaneously. It is the main reason that QSobel can extract edges quite fast. Through designing and analyzing the quantum circuit of QSobel, we demonstrate that QSobel can extract edges in the computational complexity of O(n2) for a FRQI quantum image with a size of2 n × 2n. Compared with all the classical edge extraction algorithms and the existing quantum edge extraction algorithms, QSobel can utilize quantum parallel computation to reach a significant and exponential ***, QSobel would resolve the real-time problem of image edge extraction.
It has been shown that clock distribution networks(CDNs)are becoming increasingly vulnerable to transient faults known as single event transients(SETs),owing to technology scaling[1].In the deep submicron regime,CDNs ...
详细信息
It has been shown that clock distribution networks(CDNs)are becoming increasingly vulnerable to transient faults known as single event transients(SETs),owing to technology scaling[1].In the deep submicron regime,CDNs contribute significantly to the chip-level soft error rate(SER)[2]
Feature-based image matching algorithms play an indispensable role in automatic target recognition (ATR). In this work, a fast image matching algorithm (FIMA) is proposed which utilizes the geometry feature of ext...
详细信息
Feature-based image matching algorithms play an indispensable role in automatic target recognition (ATR). In this work, a fast image matching algorithm (FIMA) is proposed which utilizes the geometry feature of extended centroid (EC) to build affine invariants. Based on at-fine invariants of the length ratio of two parallel line segments, FIMA overcomes the invalidation problem of the state-of-the-art algorithms based on affine geometry features, and increases the feature diversity of different targets, thus reducing misjudgment rate during recognizing targets. However, it is found that FIMA suffers from the parallelogram contour problem and the coincidence invalidation. An advanced FIMA is designed to cope with these problems. Experiments prove that the proposed algorithms have better robustness for Gaussian noise, gray-scale change, contrast change, illumination and small three-dimensional rotation. Compared with the latest fast image matching algorithms based on geometry features, FIMA reaches the speedup of approximate 1.75 times. Thus, FIMA would be more suitable for actual ATR applications.
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...
详细信息
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interpr...
详细信息
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
Resources over Internet have such intrinsic characteristics as growth, autonomy and diversity, which have brought many challenges to the efficient sharing and comprehensive utilization of these resources. This paper p...
详细信息
Resources over Internet have such intrinsic characteristics as growth, autonomy and diversity, which have brought many challenges to the efficient sharing and comprehensive utilization of these resources. This paper presents a novel approach for the construction of the Internet-based Virtual Computing Environment (iVCE), whose sig- nificant mechanisms are on-demand aggregation and autonomic collaboration. The iVCE is built on the open infrastructure of the Internet and provides harmonious, transparent and integrated services for end-users and applications. The concept of iVCE is presented and its architectural framework is described by introducing three core concepts, i.e., autonomic element, virtual commonwealth and virtual executor. Then the connotations, functions and related key technologies of each components of the architecture are deeply analyzed with a case study, iVCE for Memory.
A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves ...
详细信息
A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves the subgraph isomorphism in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model is used to balance the workload among threads in a warp. The prototype of GPUSI is implemented, and comprehensive experiments of various graph isomorphism operations are carried on diverse large graphs. The experiments clearly demonstrate that GPUSI has good scalability and can achieve speed-up of 1.4–2.6 compared to the state-of-the-art solutions.
The performance gap for high performance applications has been widening over time. High level program transformations are critical to improve applications' performance, many of which concern the determination of o...
详细信息
暂无评论