Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of differ...
详细信息
ISBN:
(纸本)9781450365239
Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of different scheduling algorithms. this paper investigates the effects of four artefacts, namely non-determinism, shuffling, time shrinking and sampling, on these metrics. We present a scheduling framework based on emulation, that is, using a real scheduler (Slurm) with a sleep program able to take into account periods of suspension. the framework is able to emulate a 50K core cluster using 10 virtualized nodes, withthe scheduler running on an isolated node. We find that the non-determinism in repeatedly running a workload has a small but discernible effect of these metrics, and that shuffling job order in a workload increases this by a factor of 5-10. Experiments with shuffled workloads indicate that the average difference of the Backfill and Suspend-Resume strategy performance is within this variation. We also propose methodologies for time shrinking and sampling to decrease the duration of emulations, while aiming to keep these metrics invariant (or linear variant) withthe original workload. We find that time shrinking to a factor of up to 90% can have similar effect on the metrics as non-determinism. For sampling, our methodology preserved the distribution of job sizes to a high extent, but had a variation in the metrics somewhat greater than for shuffling. Finally, we use our framework to study in-depth Slurm's scheduling performance, and discover a deficiency in the Suspend-Resume implementation.
the use of traffic signs is vital especially when travelling in the highways or the hills in adverse environmental conditions. the pace of advancements in the field of Machine learning has opened doors for scopes of i...
详细信息
the use of traffic signs is vital especially when travelling in the highways or the hills in adverse environmental conditions. the pace of advancements in the field of Machine learning has opened doors for scopes of improvement in the performance of Convolutional Neural Network architectures dedicated to classification of traffic signs. Speed is as important as accuracy for such problems in the field of Advance Driver Assistance Systems and the use of GPU instead of CPU gives the benefit of parallelprocessing. Gradient Descent helps navigating towards the minima of the loss function. Purpose of various gradient descent optimizing algorithms is to help in quicker convergence. this proposed algorithm comprises of a compact Convolutional Neural Network architecture that was trained on GPU using RMSProp, Adam and Nadam optimizers on the BelgiumTS dataset. RMSProp and Adam caused either underfitting or over-fitting that was resolved by Nadam used with an appropriate dropout with 97.51 training accuracy and 96.78 testing accuracy. the predictions on test images convey that the architecture trained using Nadam works perfectly for blurry images, positionally challenging images and images with uneven illumination.
Monte-Carlo rendering algorithms are known for producing highly realistic images, but at a significant computational cost, because they rely on tracing up to trillions of light paths through a scene to simulate physic...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
Monte-Carlo rendering algorithms are known for producing highly realistic images, but at a significant computational cost, because they rely on tracing up to trillions of light paths through a scene to simulate physically based light transport. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. As one of the Monte-Carlo rendering algorithms, PSSMLT (Primary Sample Space Metropolis Light Transport) is widely used nowadays for photorealistic rendering. Unfortunately, the computational cost of PSSMLT is still very high since the space of light paths in high-dimension and up to trillions of paths are typically required in such path space. Recent research on PSSMLT has proposed a variety of optimized methods for single node rendering, however, multi-node rendering for PSSMLT is rarely mentioned due in large part to the complicated mathematical model, complicated physical processes and the irregular memory access patterns, and the imbalanced workload of light-carrying paths. In this paper, we present a highly scalable distributed parallel simulation framework for PSSMLT. Firstly, based on light transport equation, we propose the notion of sub-image with certain property for multi-node rendering and theoretically prove that the whole set of sub-images can be combined to produce the final image;then we further propose a sub-image based assignment partitioning algorithm for multi-node rendering since the traditional demand-driven assignment partitioning algorithm doesn't work well. Secondly, we propose a physically based parallel simulation for the PSSMLT algorithm, which is revealed on a parallel computer system in master-worker paradigm. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, and then a static/dynamic hybrid scheduling strategy is described. Experiments show that framework has a nearly linear speedup along wi
the proceedings contain 146 papers. the topics discussed include: a survey on stock market prediction;designing a green data processing device using different input/output standards on FPGA;tensor decomposition of bio...
ISBN:
(纸本)9781538660263
the proceedings contain 146 papers. the topics discussed include: a survey on stock market prediction;designing a green data processing device using different input/output standards on FPGA;tensor decomposition of biometric data using singular value decomposition;parallelization of a multipartite graph matching algorithm for tracking multiple football players;an analysis of biometric based security systems;brain tumor segmentation by texture feature extraction withthe parallel implementation of fuzzy C-means using CUDA on GPU;predictive data modeling: educational data classification and comparative analysis of classifiers using python;a high capacity framework for reversible information embedding in medical images;and machine learning-based voltage dip measurement of smart energy meter.
Modern FPGAs (Field Programmable Gate Arrays) are becoming increasingly important when it comes to embedded system development. Within these FPGAs, soft-core processors are often used to solve a wide range of differen...
详细信息
ISBN:
(纸本)9781728125312
Modern FPGAs (Field Programmable Gate Arrays) are becoming increasingly important when it comes to embedded system development. Within these FPGAs, soft-core processors are often used to solve a wide range of different tasks. Soft-core processors are a cost-effective and time-efficient way to realize embedded systems. the trend for soft-core processors, as well as mainstream CPUs (central processing units), leads to multi-core architectures. Boththe necessary memory architectures and the compilers play an important role in this process. In this paper, a novel method that aims at minimizing the necessary memory resources on the FPGA while maximizing the processing speed of any given algorithm is described. In the first step, an application-specializable multi-soft-core processor architecture is presented that is capable of solving problems while adhering to hard real-time deadlines. Its special architecture and other necessary features are discussed. Furthermore, a method for the generation of optimized machine code for each processor core as well as hard real-time compatible deadlock handling mechanisms are presented. Selected algorithms are implemented to demonstrate the functionality and efficiency of the realized approach for different configurations of the multi-soft-core processor architecture.
Withthe increasing size of high performance computing systems, the expensive communication overhead between processors has become a key factor leading to the performance bottleneck. However, default process-to-proces...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
Withthe increasing size of high performance computing systems, the expensive communication overhead between processors has become a key factor leading to the performance bottleneck. However, default process-to-processor mapping strategies do not take into account the topology of the interconnection network, and thus the distance spanned by communication messages may be particularly far. In order to enhance the communication locality, we propose a new topology-aware mapping method called TAMM. By generating an accurate description of the communication pattern and network topology, TAMM employs a two-step optimization strategy to obtain an efficient mapping solution for various parallel applications. this strategy first extracts an appropriate subset of all idle computing resources on the underlying system and then constructs an optimized one-to-one mapping with a refined iterative algorithm. Experimental results demonstrate that TAMM can effectively improve the communication performance on the Tianhe-2A supercomputer.
Glaucoma is a disease associated with retina of eye. Presently, millions of human being is suffering from this disease. Early detection of these diseases can save the people from blindness. therefore, various methods ...
详细信息
Visible signs of climate change call for urgent actions on food retail industry, since such a sector is characterized by an abundant carbon footprint. Being CO2 (or R744) recognized across the world as the most promis...
详细信息
Visible signs of climate change call for urgent actions on food retail industry, since such a sector is characterized by an abundant carbon footprint. Being CO2 (or R744) recognized across the world as the most promising working fluid for supermarket applications, commercial transcritical R744 refrigeration systems have emerged as leading hydrofluorocarbon (HFC)-free technologies. this study is intended to implement an in-depth review study covering the most important aspects related to the state-of-the-art pure R744 refrigeration plants for food retail applications, including the evolution of system architectures, some field measurements, the main available results from an energy, environmental and economic perspective as well as the indispensable future investigations. It could be concluded that, in spite of some persisting barriers which still prevent such technologies from a wider adoption, the usage of R744 as the only refrigerant in supermarkets is no longer open to dispute, even in warm locations. (C) 2018 Elsevier Ltd and IIR. All rights reserved.
In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a par...
详细信息
ISBN:
(纸本)9781538649756
In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a particular type of applications, SAWS considers several simple scheduling algorithms and tries to use the one that best fits each application or phase within an application. through detailed simulations we demonstrate that a practical implementation of SAWS can obtain IPC values that closely match the best scheduling algorithm in each case.
In the field of signal process, Fast Fourier Transform (FFT) is a widely used algorithm to transform signal data from time to frequency. Unfortunately, withthe exponential growth of data, traditional methods cannot m...
详细信息
ISBN:
(数字)9783030050573
ISBN:
(纸本)9783030050573;9783030050566
In the field of signal process, Fast Fourier Transform (FFT) is a widely used algorithm to transform signal data from time to frequency. Unfortunately, withthe exponential growth of data, traditional methods cannot meet the demand of large-scale computation on these big data because of three main challenges of large-scale FFT, i.e., big data size, real-time data processing and high utilization of compute resources. To satisfy these requirements, an optimized FFT algorithm in Cloud is deadly needed. In this paper, we introduce a new method to conduct FFT in Cloud withthe following contributions: first, we design a parallel FFT algorithm for large-scaled signal data in Cloud;second, we propose a MapReduce-based mechanism to distribute data to compute nodes using big data processing framework;third, an optimal method of distributing compute resources is implemented to accelerate the algorithm by avoiding redundant data exchange between compute nodes. the algorithm is designed in MapReduce computation framework which contains three steps: data preprocessing, local data transform and parallel data transform to integrate processing results. the parallel FFT is implemented in a 16-node Cloud to process real signal data the experimental results reveal an obvious improvement in the algorithm speed. Our parallel FFT is approximately five times faster than FFT in Matlab in when the data size reaches 10 GB.
暂无评论