Stencil computations are core of wide range of scientific and engineering applications. A lot of efforts have been put into improving efficiency of stencil calculations on different platforms, but unfortunately it is ...
详细信息
Stencil computations are core of wide range of scientific and engineering applications. A lot of efforts have been put into improving efficiency of stencil calculations on different platforms, but unfortunately it is not easy to reuse. In this paper we present a PAttern-Driven Stencil compiler-based tool and a simple tuning system to reuse those well optimized methods and codes. We also suggest extensions to OpenMP, depicting high-level data structures in order to facilitate recognition of various stencil computation patterns. The PADS allows programmers to rewrite kernel of stencils or reuse source-to-source translator outputs as optimized stencil template codes with related tuning parameters, In addition, PADS consists of a OpenMP to CUDA translator and code generator using optimized template codes. It also obtains architecture-specific parameters to tune stencils across different GPU platforms. To demonstrate our system flexibility and performance portability, we illustrate four different stencil computations, Laplacian operator with Jacobi iterative method, divergence operator, 3 dimension 25 point stencil and a 2D heat equation using ADI method with periodic boundary conditions. PADS succeeds in generating all these four stencil codes using different optimization strategies and delivers a promising performance improvement.
Coverage model is the main technique to evaluate the thoroughness of dynamic verification of a Design-under-Verification (DUV). However, rather than achieving a high coverage, the essential purpose of verification is ...
详细信息
Coverage model is the main technique to evaluate the thoroughness of dynamic verification of a Design-under-Verification (DUV). However, rather than achieving a high coverage, the essential purpose of verification is to expose as many bugs as possible. In this paper, we propose a novel verification methodology that leverages the early bug prediction of a DUV to guide and assess related verification process. To be specific, this methodology utilizes predictive models built upon artificial neural networks (ANNs), which is capable of modeling the relationship between the high-level attributes of a design and its associated bug information. To evaluate the performance of constructed predictive model, we conduct experiments on some open source projects. Moreover, we demonstrate the usability and effectiveness of our proposed methodology via elaborating experiences from our industrial practices. Finally, discussions on the application of our methodology are presented.
As the feature size of FPGA shrinks to nanometers, soft errors increasingly become an important concern for SRAM-based FPGAs. Without consideration of the application level impact, existing reliability-oriented placem...
详细信息
As the feature size of FPGA shrinks to nanometers, soft errors increasingly become an important concern for SRAM-based FPGAs. Without consideration of the application level impact, existing reliability-oriented placement and routing approaches analyze soft error rate (SER) only at the physical level, consequently completing the design with suboptimal soft error mitigation. Our analysis shows that the statistical variation of the application level factor is significant. Hence in this work, we first propose a cube-based analysis to efficiently and accurately evaluate the application level factor. And then we propose a cross-layer optimized placement and routing algorithm to reduce the SER by incorporating the application level and the physical level factor together. Experimental results show that, the average difference of the application level factor between our cube-based method and Monte Carlo golden simulation is less than 0.01. Moreover, compared with the baseline VPR placement and routing technique, the cross-layer optimized placement and routing algorithm can reduce the SER by 14% with no area and performance overhead.
GPUs have recently been explored as a new general-purpose computing platform, which are suitable for the acceleration of compute-intensive EDA applications. In this paper we describe a GPU-based one- to n-detection fa...
详细信息
GPUs have recently been explored as a new general-purpose computing platform, which are suitable for the acceleration of compute-intensive EDA applications. In this paper we describe a GPU-based one- to n-detection fault simulator for both stuck-at and transition faults, which demonstrates a 20X speedup over a commercial CPU-based fault simulator. We further show new fault-simulation-based test selection applications enabled by this accelerated fault simulation. Our results demonstrate that the tests selected from the applications achieve higher fault coverages for 1-to-n detections with steeper fault coverage curves, as well as a better delay test quality, in comparison with tests deterministically generated by commercial ATPG tools.
Web workloads are known to vary dynamically with time which poses a challenge to resource allocation among the applications. In this paper, we argue that the existing dynamic resource allocation based on resource util...
详细信息
Web workloads are known to vary dynamically with time which poses a challenge to resource allocation among the applications. In this paper, we argue that the existing dynamic resource allocation based on resource utilization has some drawbacks in virtualized servers. Dynamic resource allocation directly based on real-time user experience is more reasonable and also has practical significance. To address the problem, we propose a system architecture that combines real time measurements and analysis of user experience for resource allocation. We evaluate our proposal using Webbench. The experiment results show that these techniques can judiciously allocate system resources.
In this paper, we consider hybrid wireless networks with a general node density λ ∈ [1, n ], where n ad hoc nodes are uniformly distributed and m base stations (BSs) are regularly placed in a sq...
In this paper, we consider hybrid wireless networks with a general node density λ ∈ [1, n ], where n ad hoc nodes are uniformly distributed and m base stations (BSs) are regularly placed in a square region A ( n , A ) = 1 , A × 1 , A with A ∈ [1, n ]. We focus on multicast sessions in which each ad hoc node as a user chooses randomly d ad hoc nodes as its destinations. Specifically, when d = 1 (or d = n − 1), a multicast session is essentially a unicast (or broadcast) session. We study the asymptotic multicast throughput for such a hybrid wireless network according to different cases in terms of m ∈ [1, n ] and d ∈ [1, n ], as n → ∞. To be specific, we design two types of multicast schemes, called hybrid scheme and BS - based scheme , respectively. For the hybrid scheme, there are two alternative routing backbones : sparse backbones and dense backbones . Particularly, according to different regimes of the node density λ = n A , we derive the thresholds in terms of m and d . Depending on these thresholds, we determine which scheme is preferred for the better performance of network throughput.
This work presents a method to detect the size and location of tumor in soft tissues using ultrasound. Quantitative ultrasound is utilized to allow an ultrasound signal to be sent from a transmitter to multiple receiv...
详细信息
This work presents a method to detect the size and location of tumor in soft tissues using ultrasound. Quantitative ultrasound is utilized to allow an ultrasound signal to be sent from a transmitter to multiple receivers. This received signal is analyzed for echogenic and echolucent tumors to differentiate between the two along with non-tumor sample and also studied for the delay to determine the size/location of the tumor. The proposed system utilizes Low Transient Pulse (LTP) technique and is implemented using Field Programmable Gate Array (FPGA) and Digital Signal Processor (DSP) technologies. In this co-design architecture, DSP carries out the analysis of received demodulated signal at a lower speed while FPGA runs at a higher one to generate LTP signal and demodulate bandpass ultrasonic signal. This work elaborates the implementation of Quadrature Amplitude Modulation (QAM) receiver on FPGA for the received signal from an ultrasound detector. LTP is applied to the tumor samples through the transmitter and the received signal at an ultrasonic receiver is passed through QAM to obtain different maxima that are then further used to compute the location and the size of the tumor using DSP. This dual platform co-design demonstrates a good application of a FPGA/DSP platform for the LTP generation and received signal processing.
ISS (Instruction Set Simulator) plays an important role in pre-silicon software development for ASIP. However, the speed of traditional simulation is too slow to effectively support full-scale software development. In...
详细信息
ISS (Instruction Set Simulator) plays an important role in pre-silicon software development for ASIP. However, the speed of traditional simulation is too slow to effectively support full-scale software development. In this paper, we propose a hybrid simulation framework which further improves the previous simulation methods by aggressively utilizing the host machine resources. The utilization is achieved by categorizing instructions of ASIP application into two types, namely custom and basic instructions, via binary instrumentation. Then in a way of hybrid simulation, only custom instructions are simulated on the ISS and basic instructions are executed fast and natively on the host machine. We implement this framework for an industrial ASIP to validate our approach. Experimental results show that when the implemented ISS, namely GS-Sim, is applied to practical multimedia decoders, an average simulation speed up to 1058.5MIPS can be achieved, which is 34.7 times of the state-of-art dynamic binary translation simulator and is the fastest to the best of our knowledge.
Li and Zhou propose an important concept for Petri nets: elementary siphons. They partition siphons into elementary and dependent ones. The controllability of the latter can be ensured by the former's proper contr...
详细信息
Li and Zhou propose an important concept for Petri nets: elementary siphons. They partition siphons into elementary and dependent ones. The controllability of the latter can be ensured by the former's proper control. They give a sufficient condition to decide whether a dependent siphon is controlled by its elementary ones in S 3 PR. However, this condition is so loose that in many cases the controllability of a dependent SMS cannot be determined although it is actually controlled. In this paper, we propose an improved condition to decide the controllability of strongly dependent SMS.
Radio frequency identification (RFID) is a technology where a reader device can "sense'' the presence of a close by object by reading a tag device attached to the object. To guarantee the coverage quality...
详细信息
Radio frequency identification (RFID) is a technology where a reader device can "sense'' the presence of a close by object by reading a tag device attached to the object. To guarantee the coverage quality, multiple RFID readers can be deployed in the given region. In this paper, we consider the problem of activation schedule for readers in a multi-reader environment. In particular, we try to design a schedule for readers to maximize the number of served tags per time-slot while avoiding various interferences. We first develop a centralized algorithm under the assumption that different readers may have different interference and interrogation radius. Next, we propose a novel algorithm which does not need any location information of the readers. Finally, we extend the previous algorithm in distributed manner in order to suit the case where no central entity exists. We conduct extensive simulations to study the performances of our proposed algorithm. And our evaluation results corroborate our theoretical analysis.
暂无评论