With the growing performance and wide application of deep neural networks (DNNs), recent years have seen enormous efforts on DNN accelerator hardware design for platforms from mobile devices to data centers. The systo...
详细信息
ISBN:
(数字)9781728197104
ISBN:
(纸本)9781728197111
With the growing performance and wide application of deep neural networks (DNNs), recent years have seen enormous efforts on DNN accelerator hardware design for platforms from mobile devices to data centers. The systolic array has been a popular architectural choice for many proposed DNN accelerators with hundreds to thousands of processing elements (PEs) for parallel computing. Systolic array-based DNN accelerators for datacenter applications have high power consumption and nonuniform workload distribution, which makes power delivery network (PDN) design challenging. Server-class multicore processors have benefited from distributed on-chip voltage regulation and heterogeneous voltage regulation (HVR) for improving energy efficiency while guaranteeing power delivery integrity. This paper presents the first work on HVR-based PDN architecture and control for systolic array-based DNN accelerators. We propose to employ a PDN architecture comprising heterogeneous on-chip and off-chip voltage regulators and multiple power domains. By analyzing patterns of typical DNN workloads via a modeling framework, we propose a DNN workload-aware dynamic PDN control policy to maximize system energy efficiency while ensuring power integrity. We demonstrate significant energy efficiency improvements brought by the proposed PDN architecture, dynamic control, and power gating, which lead to a more than five-fold reduction of leakage energy and PDN energy overhead for systolic array DNN accelerators.
The line loss in power distribution network is an important index that affects the economic benefit of power supply enterprise. In order to ensure the accuracy and stability of the line loss calculation based on large...
详细信息
ISBN:
(纸本)9781538664612
The line loss in power distribution network is an important index that affects the economic benefit of power supply enterprise. In order to ensure the accuracy and stability of the line loss calculation based on large amount of power measurement data, the distributedparallelprocessing method is applied to the line loss computing service, and the line loss calculation model in power distribution network was obtained by the fitting of BP neural network. Furthermore, many examples are given to test the algorithm proposed in this paper, the results show that the method can guarantee the stability and the accuracy of calculation results in line loss calculation.
Nowadays, web servers often face the threat of distributed denial of service attacks and their intrusion prevention systems cannot detect those attacks effectively. Many existing intrusion prevention systems detect at...
详细信息
The proceedings contain 24 papers. The special focus in this conference is on Cloud Computing. The topics include: Exploiting the spam correlations in scalable online social spam detection;dynamic network anomaly dete...
ISBN:
(纸本)9783030235017
The proceedings contain 24 papers. The special focus in this conference is on Cloud Computing. The topics include: Exploiting the spam correlations in scalable online social spam detection;dynamic network anomaly detection system by using deep learning techniques;heterogeneity-aware data placement in hybrid clouds;towards automated configuration of cloud storage gateways: A data driven approach;the case for physical memory pools: A vision paper;a parallel algorithm for bayesian text classification based on noise elimination and dimension reduction in spark computing environment;on the optimal number of computational resources in MapReduce;class indistinguishability for outsourcing equality conjunction search;a hybrid approach for synchronizing clocks in distributed systems;a Method and tool for automated induction of relations from quantitative performance logs;JCallGraph: Tracing microservices in very large scale container cloud platforms;an overview of cloud computing testing research;a robust multi-terminal support method based on tele-immersion multimedia technology;CMonitor: A monitoring and alarming platform for container-based clouds;CPR: Client-side processing of range predicates;systematic construction, execution, and reproduction of complex performance benchmarks;multiple workflow scheduling with offloading tasks to edge cloud;min-edge P-cycles: An efficient approach for computing P-cycles in optical data center networks;toward accurate and efficient emulation of public blockchains in the cloud;Teleportation of VM disk images over WAN;live migration of virtual machines in openstack: A perspective from reliability evaluation;an approach to failure prediction in cluster by self-updating cause-and-effect graph.
The proceedings contain 88 papers. The topics discussed include: job scheduling with license reservation: a semantic approach;a deadline satisfaction enhanced workflow scheduling algorithm;distributed load balancing f...
ISBN:
(纸本)9780769543284
The proceedings contain 88 papers. The topics discussed include: job scheduling with license reservation: a semantic approach;a deadline satisfaction enhanced workflow scheduling algorithm;distributed load balancing for parallel agent-based simulations;a failure handling framework for distributed data mining services on the grid;balancing workloads of servers maintaining scalable distributed data structures;high performance matrix inversion on a multi-core platform with several GPUs;parallization of adaboost algorithm through hybrid MPI/OpenMP and transactional memory;scaleable sparse matrix-vector multiplication with functional memory and GPUs;accelerating parameter sweep applications using CUDA;FFT implementation on a streaming architecture;and multi-core desktop processors make possible real-time electron tomography.
With the rapid development of big data technology, the requirement of data processing capacity and efficiency result in failure of a number of legacy security technologies, especially in the data security domain. Data...
详细信息
ISBN:
(纸本)9781538673089
With the rapid development of big data technology, the requirement of data processing capacity and efficiency result in failure of a number of legacy security technologies, especially in the data security domain. Data security risks became extremely important for big data usage. We introduced a novel method to preform big data security control, which comprises three steps, namely, user context recognition based on zero trust, fine-grained data access authentication control, and data access audit based on full network traffic to recognize and intercept risky data access in big data environment. Experiments conducted on the fine-grained big data security method based on the zero trust model of drug-related information analysis system demonstrated that this method can identify the majority of data security risks.
EDDY (Evaluation of Differential DependencY) interrogates transcriptomic data to identify differential genetic dependencies within a biological pathway. Through its probabilistic framework with resampling and permutat...
详细信息
EDDY (Evaluation of Differential DependencY) interrogates transcriptomic data to identify differential genetic dependencies within a biological pathway. Through its probabilistic framework with resampling and permutation, aided by the incorporation of annotated gene sets, EDDY demonstrated superior sensitivity to other methods. However, this statistical rigor incurs considerable computational cost, limiting its application to larger datasets. The ample and independent computation coupled with manageable memory footprint positioned EDDY as a strong candidate for graphical processing unit (GPU) implementation. Custom kernels decompose the independence test loop, network construction, network enumeration, and Bayesian network scoring to accelerate the computation. GPU-accelerated EDDY consistently exhibits two orders of magnitude in performance enhancement, allowing the statistical rigor of the EDDY algorithm to be applied to larger datasets.
Deep neural network (DNN) training is generally performed by cloud computing platforms. However, cloud-based training has several problems such as network bottleneck, server management cost, and privacy. To overcome t...
详细信息
ISBN:
(纸本)9781728125848
Deep neural network (DNN) training is generally performed by cloud computing platforms. However, cloud-based training has several problems such as network bottleneck, server management cost, and privacy. To overcome these problems, one of the most promising solutions is distributed DNN model training which trains the model with not only high-performance servers but also low-end power-efficient mobile edge or user devices. However, due to the lack of a framework which can provide an optimal cluster configuration (i.e., determining which computing devices participate in DNN training tasks), it is difficult to perform efficient DNN model training considering DNN service providers' preferences such as training time or energy efficiency. In this paper, we introduce a novel framework for distributed DNN training that determines the best training cluster configuration with available heterogeneous computing resources. Our proposed framework utilizes pre-training with a small number of training steps and estimates training time, power, energy, and energy-delay product (EDP) for each possible training cluster configuration. based on the estimated metrics, our framework performs DNN training for the remaining steps with the chosen best cluster configurations depending on DNN service providers' preferences. Our framework is implemented in TensorFlow and evaluated with three heterogeneous computing platforms and five widely used DNN models. According to our experimental results, in 76.67% of the cases, our framework chooses the best cluster configuration depending on DNN service providers' preferences with only a small training time overhead.
Performance of a data center is a function of three features;bandwidth, latency, and reliability. By adopting optical technology in data center network, bandwidth increment, in addition to reduction of transmission la...
详细信息
Performance of a data center is a function of three features;bandwidth, latency, and reliability. By adopting optical technology in data center network, bandwidth increment, in addition to reduction of transmission latency and power consumption, is achieved. Unfortunately, fault tolerance of the optical networks has raised less attention so far. So in this paper, we propose a fault-tolerant, scalable, and high-performance optical architecture built upon previously-proposed O-TF network, with the goal of redundancy optimization and reducing the minimum number of wavelength channels required for non-blocking functionality of the network. Moreover, reducing network diameter, in O-FTF network compared to O-TF and Wavecube networks, has led to higher network performance at the presence of network failures.
Today Graphics processing Units (GPUs) are being used for more than traditional graphics processing. Large super computers such as Titan are utilizing GPUs to solve problems that have little resemblance to their origi...
详细信息
Today Graphics processing Units (GPUs) are being used for more than traditional graphics processing. Large super computers such as Titan are utilizing GPUs to solve problems that have little resemblance to their original purpose. To improve the performance of these applications, GPU architects are increasing cache sizes to lower latencies of non-uniform memory references found in these programs. In this paper, we investigate an alternative approach where a victim buffer is added to the first level cache. Our studies show that a 256-line victim cache can increase L1 hit rate by 15% and improve IPC by 7.5% over the baseline. This victim cache outperforms increasing the cache size by 400% while being a less costly solution in terms of area.
暂无评论