Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this...
详细信息
ISBN:
(纸本)9781467365994
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suf...
详细信息
Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma *** pape...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to ...
详细信息
Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very tim...
详细信息
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programming models, provides an efficient alternative for accelerating MC dose calculation. This paper implements the OpenMP-based MC Dose Planning Method (DPM) for radiotherapy treatment problems on the Intel MIC architecture. The implementation has been verified on the target MIC coprocessor including 57 cores. The results demonstrate that the OpenMP-based DPM implementation exhibits very accurate results and achieves the maximum speedup of 10.53 times in comparison to the original DPM one on a Xeon E5-2670 CPU. Additionally, speedup and efficiency of the implementation running on the different number of cores in MIC are also reported.
This paper presents an implementation of an accurate and efficient compensated Double-precision General Matrix Multiplication (DGEMM) based on OpenBLAS for 64-bit ARMv8 multi-core processors. Due to cancellation pheno...
详细信息
ISBN:
(纸本)9781467386692
This paper presents an implementation of an accurate and efficient compensated Double-precision General Matrix Multiplication (DGEMM) based on OpenBLAS for 64-bit ARMv8 multi-core processors. Due to cancellation phenomena in floating point arithmetic, the results of DGEMM may not be as accurate as expected. In order to increase the accuracy of DGEMM, we compensate the error introduced by its dot product kernel (GEBP) by applying an error-free transformation to rewrite the kernel in assembly language. We optimize the computations in the inner kernel through exploiting loop unrolling, instruction scheduling and software-implemented register rotation to exploit instruction level parallelism (ILP). We also conduct a priori error analysis of the derived CompDGEMM. Our compensated DGEMM is as accurate as the existing quadruple precision GEMM using MBLAS, but is up to 6.4x faster. Our parallel implementation achieves good performance and scalability under varying thread counts across a range of matrix sizes evaluated.
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine ...
详细信息
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine granularity for migration and simple hardware complexity for implementation. This paper proposes a dynamic LLC management scheme to achieve these goals. The proposed scheme migrates cache resources among different cores at the granularity of cache blocks, instead of ways. The quantity of victim cache blocks that each victim core can migrate to other target cores are related to an eviction probability, which are calculated according to the performance goal. Then the victim cache blocks for a target core is chosen from the nearest victim core who has non-zero eviction probability by introducing innovate E-Table structure in CMP. The eviction probabilities are updated periodically. With the help of E-Tables, the proposal achieves low-latency accesses by always keeping the required cache blocks near to the target cores. And fine granularity is guaranteed by maintaining an eviction probability for each core. In addition, only little additional hardware changes to traditional cache structure is required. Simulation results suggest significant performance improvements from 6.8% to 22.7% over related works.
The hyperspectral remote sensing is one of the frontier techniques in the remote sensing research fields. Applying the sparse coding model to the hyperspectral remote sensing image processing is a hot topic in hypersp...
详细信息
ISBN:
(纸本)9781467372220
The hyperspectral remote sensing is one of the frontier techniques in the remote sensing research fields. Applying the sparse coding model to the hyperspectral remote sensing image processing is a hot topic in hyperspectral information processing. To improve the accuracy of hyperspectral image classification, we propose a classification method based on the spatial-spectral join-t contextual sparse coding. Firstly, a dictionary is obtained by training using samples selected from the ground-truth reference data. Then, the sparse coefficients of each pixel are calculated based on the learned dictionary. Afterward, the sparse coefficients are input to the classifier and the final classification result is obtained. The visible and near-infrared hyperspectral remote sensing image collected by Tiangong-1 in Chaoyang District of Beijing is used to evaluate the performance of the proposed approach. Experimental results show that the proposed method yields the best classification performance with the overall accuracy of 95.74% and the Kappa coefficient of 0.9476 in comparison with other classification methods.
Currently, the performance problems of software systems gets more and more attentions. Among various diagnosis methods based on system traces, principal component analysis (PCA) based methods are widely used due to th...
详细信息
ISBN:
(纸本)9781479919352
Currently, the performance problems of software systems gets more and more attentions. Among various diagnosis methods based on system traces, principal component analysis (PCA) based methods are widely used due to the high accuracy of the diagnosis results and requiring no specific domain knowledge. However, according to our experiments, we have validated several shortcomings existed in PCA-based methods, including requiring traces with a same call sequence, inefficiency when the traces are long, and missing performance problems. To cope with these issues, we introduce a segmentation based online diagnosis method in this poster.
暂无评论