This paper presents a novel method, namely slicing execution, for model checking C programs with respect to temporal safety properties. The distinguished feature is that it shows a nice approach to the efficient reduc...
详细信息
This paper presents a novel method, namely slicing execution, for model checking C programs with respect to temporal safety properties. The distinguished feature is that it shows a nice approach to the efficient reduction of state space by abstraction and symbolic representation. Slicing execution is founded on an over-approximated semantics of C programs by variable abstraction, and executes symbolically only the relevant statements under abstraction criteria to construct over-approximated finite models of programs, which may be model checked. The variable abstraction criterion begins with a proper initial set of program variables and may be iteratively refined according to spurious counterexamples generated during model checking. In general, the properties to be verified often involve only a few variables in practical programs. In these cases, significant state space reduction, as well as considerable improvement of the scalability, may be achieved. The presented method has been used to verify the initial handshake process of SSL protocol based on the C source code of openssl-0.9.6c. The experiment results confirm that slicing execution is not only practical but also effective.
Model abstraction plays an important role in model checking of source codes of programs. Slicing execution is a lightweight symbolic execution procedure to extract the models of C programs in an over-approximated way....
详细信息
Model abstraction plays an important role in model checking of source codes of programs. Slicing execution is a lightweight symbolic execution procedure to extract the models of C programs in an over-approximated way. In this paper, we present an approach to improving slicing execution with a novel concept called partial weakest precondition (PWP) to alleviate the space explosion problem. PWPs specify the corresponding weakest precondition conservatively by only considering part of program variables. We present how to integrate PWP with slicing execution, which leads to a compact model with much smaller state space compared with the one obtained by the original slicing execution. A new PWP implementation is also presented to avoid possible exponential PWP formula size and support pointers and aliases as well. The distinguished features of the implementation are that it does not need to translate the program to the passive form beforehand, and it supports loops very well. Comparing with slicing execution without PWP, the experimentation on SSL protocol based on the C source code openssl-0.9.6c shows that the state space may be reduced to only 1/10 after applying PWP.
The audio classification task aims to discriminate between different audio signal types. In this task, deep neural networks have achieved better performance than the traditional shallow architecture-based machine-lear...
详细信息
The audio classification task aims to discriminate between different audio signal types. In this task, deep neural networks have achieved better performance than the traditional shallow architecture-based machine-learning method. However, deep neural networks often require huge computational and storage requirements that hinder the deployment in embedded devices. In this paper, we proposed a distillation method which transfers knowledge from well-trained networks to a small network, and the method can compress model size while improving audio classification precision. The contributions of the proposed method are two folds: a multi-level feature distillation method was proposed and an adversarial learning strategy was employed to improve the knowledge transfer. The extensive experiments are conducted on three audio classification tasks, audio scene classification, general audio tagging, and speech command recognition. The experimental results demonstrate that: the small network can provide better performance while achieves the calculated amount of floating-point operations per second (FLOPS) compression ratio of 76:1 and parameters compression ratio of 3:1.
The current paper explores the capability and flexibility of field programmable gate-arrays (FPGAs) to implement variable-precision floating-point (VP) arithmetic. First, the VP exact dot product algorithm, which uses...
详细信息
The current paper explores the capability and flexibility of field programmable gate-arrays (FPGAs) to implement variable-precision floating-point (VP) arithmetic. First, the VP exact dot product algorithm, which uses exact fixed-point operations to obtain an exact result, is presented. A VP multiplication and accumulation unit (VPMAC) on FPGA is then proposed. In the proposed design, the parallel multipliers generate the partial products of mantissa multiplication in parallel, which is the most time-consuming part in the VP multiplication and accumulation operation. This method fully utilizes DSP performance on FPGAs to enhance the performance of the VPMAC unit. Several other schemes, such as two-level RAM bank, carry-save accumulation, and partial summation, are used to achieve high frequency and pipeline throughput in the product accumulation stage. The typical algorithms in Basic Linear Algorithm Subprograms (i.e., vector dot product, general matrix vector product, and general matrix multiply product), LU decomposition, and Modified Gram-Schmidt QR decomposition, are used to evaluate the performance of the VPMAC unit. Two schemes, called the VPMAC coprocessor and matrix accelerator, are presented to implement these applications. Finally, prototypes of the VPMAC unit and the matrix accelerator based on the VPMAC unit are created on a Xilinx XC6VLX760 FPGA chip. Compared with a parallel software implementation based on OpenMP running on an Intel Xeon Quad-core E5620 CPU, the VPMAC coprocessor, equipped with one VPMAC unit, achieves a maximum acceleration factor of 18X. Moreover, the matrix accelerator, which mainly consists of a linear array of eight processing elements, achieves 12X-65X better performance.
This paper presents a novel method for shape analysis, which can deal with complex expressions in C language. It supports taking addresses of fields and stack variables. The concept of abstract evaluation path (AEP) i...
详细信息
This paper presents a novel method for shape analysis, which can deal with complex expressions in C language. It supports taking addresses of fields and stack variables. The concept of abstract evaluation path (AEP) is proposed, which is generated from the expression in the language. AEP is used to refine the abstract shape graph (ASG) to get a set of more precise ASGs, on which the semantics of the statement can be defined easily. The results can be used to determine "shape invariants" and detect memory leak conservatively. A prototype has been implemented and the results of the experiment are shown.
Simultaneous localization and map construction (SLAM) tasks have been proven to benefit greatly from the depth information of the environment. In this paper, we first present an unsupervised end-to-end learning framew...
详细信息
Simultaneous localization and map construction (SLAM) tasks have been proven to benefit greatly from the depth information of the environment. In this paper, we first present an unsupervised end-to-end learning framework for the task of monocular depth and camera motion estimation from video sequences. The difference between our work and the existing unsupervised methods is that we not only use image reconstruction for supervising but also exploit the pose estimation method used in traditional SLAM approaches to enhance the supervised signal and add extra training constraints for the task of monocular depth and camera motion estimation. Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features. Qualitative and quantitative experiments have shown that our unsupervised learning framework performs the depth estimation task superior to the supervised methods and outperforms the previous state-of-the-art unsupervised approach by 13.5% on KITTI dataset. For the pose estimation task, our method performs comparably to the supervised methods that use ground-truth pose data for training. Besides, our unsupervised learning framework can significantly accelerate the initialization process of the traditional ORB-SLAM system and effectively improve the accuracy of environmental mapping in strong lighting and weak texture scenes.
Audio classification aims to discriminate between different audio signal types, and it has received intensive attention due to its wide applications. In deep learning-based audio classification methods, researchers us...
详细信息
Audio classification aims to discriminate between different audio signal types, and it has received intensive attention due to its wide applications. In deep learning-based audio classification methods, researchers usually transform the raw signal of audios into different feature representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients) as the inputs of networks. However, selecting the feature representation requires expert knowledge and extensive experimental verification. Besides, using a single type of feature representation may cause suboptimal results as the information implied in different kinds of feature representations may be complementary. Previous works show that ensembling the networks trained on different representations can greatly boost classification performance. However, making inferences using multiple networks is cumbersome and computation expensive. In this paper, we propose a novel end-to-end collaborative training framework for the audio classification task. The framework takes multiple representations as inputs to train the networks jointly with a knowledge distillation method. Consequently, our framework significantly promotes the performance of networks without increasing the computational overhead in the inference stage. Extensive experimental results demonstrate that the proposed approach improves classification performance and achieves competitive results on both acoustic scene classification tasks and general audio tagging tasks.
As an innovative grid computing technique for sharing the distributed memory resources in a high-speed wide-area network, RAM Grid exploits the distributed computing nodes, and provides remote memory for the user node...
详细信息
As an innovative grid computing technique for sharing the distributed memory resources in a high-speed wide-area network, RAM Grid exploits the distributed computing nodes, and provides remote memory for the user nodes which are short of memory. The performance of RAM Grid is constrained with the expensive network communication cost. In order to hide the latency of remote memory access and improve the performance, the authors proposed the push-based prefetching to enable the memory providers to push the potential useful pages to the user nodes. For each provider, it employs sequential pattern mining techniques, which adapts to the characteristics of memory page access sequences, on locating useful memory pages for prefetching. They have verified the effectiveness of the proposed method through trace-driven simulations.
Prefetching and prepromotion are two important techniques for hiding the memory access latency. Reference prediction tables (RPT) plays a significant role in the process of prefetching or prepromoting data with linear...
详细信息
Prefetching and prepromotion are two important techniques for hiding the memory access latency. Reference prediction tables (RPT) plays a significant role in the process of prefetching or prepromoting data with linear memory access patterns. The traditional RPT management, LRU replacement algorithm, can not manage RPT efficiently. This leads to that large RPT has to be used for the considerable performance. The cost brought from the large capacity limits the usage of RPT in real processors. This paper uses bimodal insert policy (BIP) and proposed scalar filter policy (SFP) in the RPT management. Owing to matching the using characteristics of RPT, BIP can reduce the RPT thrashing and SFP can filter the useless scalar instructions in it. After testing 8 NPB benchmarks on a fullsystem simulator, we find that our approaches improve the RPT hit rate by 53.81% averagely, and increases prefetching and prepromotion operations by 18.85% and 53.55% averagely over the traditional LRU management.
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many rea...
详细信息
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributedparallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.
暂无评论