The application of Support Vector Machine (SVM) over data stream is growing with the increasing real-time processing requirements in classification field, like anomaly detection and real-time image processing. However...
详细信息
ISBN:
(纸本)9781538637913
The application of Support Vector Machine (SVM) over data stream is growing with the increasing real-time processing requirements in classification field, like anomaly detection and real-time image processing. However, the dynamic live data with high volume and fast arrival rate in data streams make it challenging to apply SVM in data stream processing. Existing SVM implementations are mostly designed for batch processing and hardly satisfy the efficiency requirement of stream processing for its inherent complexity. To address the challenges, we propose a high efficiency distributed SVM framework over data stream (HDSVM), which consists of two main algorithms, incremental learning algorithm and distributed algorithm. Firstly, we propose a partial support vectors reserving incremental learning algorithm (PSVIL). By selecting a subset of support vectors based on their distances to classification hyperplane instead of the universal set to update SVM, the algorithm achieves lower time overhead while ensuring accuracy. Secondly, we propose a distribution remaining partition and fast aggregation distributed algorithm (DRPFA) for SVM. The real-time data is partitioned based on the original distribution with clustering instead of random partition, and historical support vectors are partitioned based on their distances to the classification hyperplane. The global hyperplane can be obtained by averaging the parameters of local hyperplanes due to the above partition strategy. Extensive experiments on Apache Storm show that the proposed HDSVM achieve lower time overhead and similar accuracy compared with the state-of-art. Speed-up ratio is increased by 2-8 times within 1% accuracy deviation.
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
In this paper, the effect of floating body effect (FBE) on a single event transient generation mechanism in fully depleted (FD) silicon-on-insulator (SOI) technology is investigated using three-dimensional techn...
详细信息
In this paper, the effect of floating body effect (FBE) on a single event transient generation mechanism in fully depleted (FD) silicon-on-insulator (SOI) technology is investigated using three-dimensional technology computer-aided design (3D- TCAD) numerical simulation. The results indicate that the main SET generation mechanism is not carder drift/diffusion but floating body effect (FBE) whether for positive or negative channel metal oxide semiconductor (PMOS or NMOS). Two stacking layout designs mitigating FBE are investigated as well, and the results indicate that the in-line stacking (IS) layout can mitigate FBE completely and is area penalty saving compared with the conventional stacking layout.
Charge sharing is becoming an important topic as the feature size scales down in fin field-effect-transistor (FinFET) technology. However, the studies of charge sharing induced single-event transient (SET) pulse q...
详细信息
Charge sharing is becoming an important topic as the feature size scales down in fin field-effect-transistor (FinFET) technology. However, the studies of charge sharing induced single-event transient (SET) pulse quenching with bulk FinFET are reported seldomly. Using three-dimensional technology computer aided design (3DTCAD) mixed-mode simulations, the effects of supply voltage and body-biasing on SET pulse quenching are investigated for the first time in bulk FinFET process. Research results indicate that due to an enhanced charge sharing effect, the propagating SET pulse width decreases with reducing supply voltage. Moreover, compared with reverse body-biasing (RBB), the circuit with forward body-biasing (FBB) is vulnerable to charge sharing and can effectively mitigate the propagating SET pulse width up to 53% at least. This can provide guidance for radiation-hardened bulk FinFET technology especially in low power and high performance applications.
Aircraft detection in remote sensing images is an intractable *** current aircraft detection methods have limited representative capabilities and heavy computational *** paper studies how to apply multi-scale feature ...
详细信息
Aircraft detection in remote sensing images is an intractable *** current aircraft detection methods have limited representative capabilities and heavy computational *** paper studies how to apply multi-scale feature representation of Convolutional Neural Networks(CNN) to aircraft detection by qualitatively and quantitatively analyzing the performance of Single Shot Detection(SSD) *** first,we find that low-level detectors are not robust enough to detect as the semantic gap *** we propose a data driven hyper-parameter selection method to alleviate this problem by determining appropriate hyper-parameters of sliding window and default box ***,we employ a multi-scale training strategy to enhance low-level predictive ***,we propose an accurate and efficient aircraft detection *** results illustrate that our method could achieve 96.84% AP at 20 FPS on NVIDIA TITAN *** with original SSD method,our proposed approach achived 2.13% AP improvement.
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning...
详细信息
ISBN:
(纸本)9781509049318
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in an architecture-independent manner, without involving domain experts. The key insight is to leverage a wide range of architecture-specific abstractions already available in LLVM, by first generating a vectorized micro-kernel in the architecture-independent LLVM IR and then improving its performance by applying a series of domain-specific yet architecture-independent optimizations. The optimized micro-kernel drops easily in existing GEMM frameworks such as BLIS and OpenBLAS. Validation focuses on optimizing GEMM in double precision on two architectures. On Intel Sandybridge and AArch64 Cortex-A57, Poca's micro-kernels outperform expert-crafted assembly code by 2.35% and 7.54%, respectively, and both BLIS and OpenBLAS achieve competitive or better performance once their micro-kernels are replaced by Poca's.
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over...
详细信息
The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual ma- chines (VMs), in particular, is critical to guarantee quality of service (QoS). In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder+, both of which can provision hundreds of VMs in seconds.
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techn...
详细信息
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services or applications for the following two reasons: Firstly, the temporal locality of duplicate data writes may not exist in some primary storage workloads thus inline caching often fails to achieve good deduplication ratio. Secondly, the post-processing deduplication allows duplicate data to be written to disks, therefore does not provide the benefit of I/O deduplication and requires high peak storage capacity. This paper presents HPDedup, a Hybrid Prioritized data Deduplication mechanism to deal with the storage system shared by applications running in co-located virtual machines or containers by fusing an inline and a post-processing process for exact deduplication. In the inline deduplication phase, HPDedup gives a fingerprint caching mechanism that estimates the temporal locality of duplicates in data streams from different VMs or applications and prioritizes the cache allocation for these streams based on the estimation. HPDedup also allows different deduplication threshold for streams based on their spatial locality to reduce the disk fragmentation. The post-processing phase removes duplicates whose fingerprints are not able to be cached due to weak temporal locality from disks. The hybrid deduplication mechanism significantly reduces the amount of redundant data written to the storage system while maintaining inline data writing performance. Our experimental results show that HPDedup clearly outperforms the state-of-the-art primary storage deduplication techniques in terms of inline cac
Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. Howeve...
详细信息
In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for...
详细信息
暂无评论