Video anomaly detection is an essential task in computer vision, focused on spotting unexpected or unusual events in surveillance footage. This study presents a hybrid framework that combines a Deep Autoencoder (AE) w...
详细信息
In an updated version of Agerwala's July 2004 keynote address at The internationalsymposium on computerarchitecture, the authors urge the computerarchitecture community to devise innovative ways of delivering c...
详细信息
In an updated version of Agerwala's July 2004 keynote address at The internationalsymposium on computerarchitecture, the authors urge the computerarchitecture community to devise innovative ways of delivering continuing improvement in system performance and price-performance, while simultaneously solving the power problem.
Silicon-Photonics architectures have enabled high speed hardware implementations of Reservoir computing (RC). With a delayed feedback reservoir (DFR) model, only one non-linear node can be used to perform RC. However,...
详细信息
ISBN:
(纸本)9781728199245
Silicon-Photonics architectures have enabled high speed hardware implementations of Reservoir computing (RC). With a delayed feedback reservoir (DFR) model, only one non-linear node can be used to perform RC. However, the delay is often provided by using off-chip fiber optics which is not only space inconvenient but it also becomes architectural bottleneck and hinders to scalability. In this paper, we propose a completely on-chip photonic RC architecture for highperformancecomputing, employing multiple electronically tunable delay lines and micro-ring resonator (MRR) switch for multi-tasking. Proposed architecture provides 84% less error compared to the state-of-the-art standalone architecture in [8] for executing NARMA task. For multi-tasking, the proposed architecture shows 80% better performance than [8]. The architecture outperforms all other proposed architectures as well. The on-chip area and power overhead of proposed architecture due to delay lines and MRR switch are 0.0184mm(2) and 26mW respectively.
Future highperformancecomputing will undoubtedly reach Petascale and beyond. Today's HPC is tomorrow's Personal computing. What are the evolving processor architectures towards Multi-core and Many-core for t...
详细信息
Following the re-invention of the FFT algorithm by Cooley and Tukey in 1965, a lot of effort has been invested into optimization of this algorithm and all its variations. In this paper, we discuss its use and optimiza...
详细信息
ISBN:
(纸本)9781728199245
Following the re-invention of the FFT algorithm by Cooley and Tukey in 1965, a lot of effort has been invested into optimization of this algorithm and all its variations. In this paper, we discuss its use and optimization for current and future radar applications, and give a brief survey on implementations that have claimed relatively high advantages in terms of performance over existing solutions. Correspondingly, we present an in-depth analysis of state-of-the-art solutions and our own implementation that will allow the reader to evaluate the performance improvements on a fair basis. Therefore, we discuss the development of a high-performance Fast Fourier Transform (FFT) using an enhanced Radix-4 decimation in frequency (DIF) algorithm, compare it against the Fastest Fourier Transform in the West (FFTW) autotuned library as well as other solutions and frameworks.
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). The usual service connectors of commercial component models do not fit some requirements of...
详细信息
ISBN:
(纸本)9780769530147
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). The usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however This paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.
Reducing energy consumption and achieving high energy efficiency in computation has become the top priority in highperformancecomputing. high energy efficiency generally requires high resource utilization since ener...
详细信息
ISBN:
(纸本)9781728199245
Reducing energy consumption and achieving high energy efficiency in computation has become the top priority in highperformancecomputing. high energy efficiency generally requires high resource utilization since energy demand for many applications and architectures is dependent on active time. We show that by using DMA the 28nm CMOS node Myriad-2 Vision Processing Unit can achieve 25 GFLOPs/W for FP32 matrix-multiplication. Our main contributions are: (i) An analysis of data transfer needs for inner and outer-product formulations of matrix multiplication with respect to the Myriad-2 memory hierarchy, (ii) An efficient use of DMA for managing matrix block transfers between on-chip and main memory (iii) A detailed analysis of the effects of matrix block shapes and DRAM page faults on performance and energy efficiency.
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods whi...
详细信息
ISBN:
(纸本)9781728161495
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods which cannot co-locate multiple latency-critical jobs with multiple backgrounds jobs while: (I) meeting the QoS requirements of all latency-critical jobs, and (2) maximizing the performance of the background jobs. This paper proposes CLITE, a Bayesian Optimization-based, multi-resource partitioning technique which achieves these goals.
暂无评论