检索结果-内蒙古大学图书馆

Efficient Evaluation of Scheduling Metrics Using Emulation A Case Study in the Effect of Artefacts 18

Efficient Evaluation of Scheduling Metrics Using Emulation A...

47th international conference on parallel processing (ICPP) / international Workshop on Embedded Multicore Systems (EMS)

作者： Barberato, Claudio Strazdins, Peter E. McCreath, Eric Atif, Muhammad Australian Natl Univ Res Sch Comp Sci Canberra ACT Australia Natl Computat Infrastruct Canberra ACT Australia

ISBN: (纸本)9781450365239

Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of different scheduling algorithms. this paper investigates the effects of four artefacts, namely non-determinism, shuffling, time shrinking and sampling, on these metrics. We present a scheduling framework based on emulation, that is, using a real scheduler (Slurm) with a sleep program able to take into account periods of suspension. the framework is able to emulate a 50K core cluster using 10 virtualized nodes, with the scheduler running on an isolated node. We find that the non-determinism in repeatedly running a workload has a small but discernible effect of these metrics, and that shuffling job order in a workload increases this by a factor of 5-10. Experiments with shuffled workloads indicate that the average difference of the Backfill and Suspend-Resume strategy performance is within this variation. We also propose methodologies for time shrinking and sampling to decrease the duration of emulations, while aiming to keep these metrics invariant (or linear variant) with the original workload. We find that time shrinking to a factor of up to 90% can have similar effect on the metrics as non-determinism. For sampling, our methodology preserved the distribution of job sizes to a high extent, but had a variation in the metrics somewhat greater than for shuffling. Finally, we use our framework to study in-depth Slurm's scheduling performance, and discover a deficiency in the Suspend-Resume implementation.

关键词： parallel job scheduling classical scheduling metrics emulation Slurm

来源：评论

学校读者我要写书评

暂无评论

Performance of Different Optimizers for Traffic Sign Classification

Performance of Different Optimizers for Traffic Sign Classif...

引用

international conference on Computing and Networking Technology (ICCNT)

作者： Ishita Joshi Milauni Desai Sannidhi Bookseller Rucha Mohod Chirag N. Paunwala Bhaumik Vaidya Sarvajanik College of Engineering and Technology Surat India

the use of traffic signs is vital especially when travelling in the highways or the hills in adverse environmental conditions. the pace of advancements in the field of Machine learning has opened doors for scopes of improvement in the performance of Convolutional Neural Network architectures dedicated to classification of traffic signs. Speed is as important as accuracy for such problems in the field of Advance Driver Assistance Systems and the use of GPU instead of CPU gives the benefit of parallel processing. Gradient Descent helps navigating towards the minima of the loss function. Purpose of various gradient descent optimizing algorithms is to help in quicker convergence. this proposed algorithm comprises of a compact Convolutional Neural Network architecture that was trained on GPU using RMSProp, Adam and Nadam optimizers on the BelgiumTS dataset. RMSProp and Adam caused either underfitting or over-fitting that was resolved by Nadam used with an appropriate dropout with 97.51 training accuracy and 96.78 testing accuracy. the predictions on test images convey that the architecture trained using Nadam works perfectly for blurry images, positionally challenging images and images with uneven illumination.

关键词： Training Road transportation Graphics processing units Computer architecture Convolutional neural networks Testing Vehicles

来源：评论

学校读者我要写书评

暂无评论

Distributed parallel Simulation of Primary Sample Space Metropolis Light Transport 18th

Distributed Parallel Simulation of Primary Sample Space Metr...

引用

18th international conference on algorithms and architectures for parallel processing (ICA3PP)

作者： Wu, Changmao Zhang, Changyou Sun, Qiao Chinese Acad Sci Inst Software Lab Parallel Software & Computat Sci Beijing Peoples R China

ISBN: (纸本)9783030050511;9783030050504

Monte-Carlo rendering algorithms are known for producing highly realistic images, but at a significant computational cost, because they rely on tracing up to trillions of light paths through a scene to simulate physically based light transport. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. As one of the Monte-Carlo rendering algorithms, PSSMLT (Primary Sample Space Metropolis Light Transport) is widely used nowadays for photorealistic rendering. Unfortunately, the computational cost of PSSMLT is still very high since the space of light paths in high-dimension and up to trillions of paths are typically required in such path space. Recent research on PSSMLT has proposed a variety of optimized methods for single node rendering, however, multi-node rendering for PSSMLT is rarely mentioned due in large part to the complicated mathematical model, complicated physical processes and the irregular memory access patterns, and the imbalanced workload of light-carrying paths. In this paper, we present a highly scalable distributed parallel simulation framework for PSSMLT. Firstly, based on light transport equation, we propose the notion of sub-image with certain property for multi-node rendering and theoretically prove that the whole set of sub-images can be combined to produce the final image;then we further propose a sub-image based assignment partitioning algorithm for multi-node rendering since the traditional demand-driven assignment partitioning algorithm doesn't work well. Secondly, we propose a physically based parallel simulation for the PSSMLT algorithm, which is revealed on a parallel computer system in master-worker paradigm. Finally, we discuss the issue of granularity of the assignment partitioning and some optimization strategies for improving overall performance, and then a static/dynamic hybrid scheduling strategy is described. Experiments show that framework has a nearly linear speedup along wi

关键词： Primary sample space metropolis light transport Physically based ray tracing Assignment partitioning Hybrid scheduling Distributed computing Performance optimization

来源：评论

学校读者我要写书评

暂无评论

PDGC 2018 - 2018 5th international conference on parallel, Distributed and Grid Computing

PDGC 2018 - 2018 5th International Conference on Parallel, D...

引用

5th international conference on parallel, Distributed and Grid Computing, PDGC 2018

ISBN: (纸本)9781538660263

the proceedings contain 146 papers. the topics discussed include: a survey on stock market prediction;designing a green data processing device using different input/output standards on FPGA;tensor decomposition of biometric data using singular value decomposition;parallelization of a multipartite graph matching algorithm for tracking multiple football players;an analysis of biometric based security systems;brain tumor segmentation by texture feature extraction with the parallel implementation of fuzzy C-means using CUDA on GPU;predictive data modeling: educational data classification and comparative analysis of classifiers using python;a high capacity framework for reversible information embedding in medical images;and machine learning-based voltage dip measurement of smart energy meter.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Case Study on Design and Evaluation of a Multi-Soft-Core Processor

Case Study on Design and Evaluation of a Multi-Soft-Core Pro...

引用

international conference and Workshop on Computing and Communication (IEMCON)

作者： Michael Kirchhoff Lothar Wagner Bernd Däne Detlef Streitferdt Wolfgang Fengler Group for Computer Architecture and Embedded Systems Technische Universität Ilmenau Ilmenau Germany Group for Software Systems and Process Informatics Technische Universität Ilmenau Ilmenau Germany

ISBN: (纸本)9781728125312

Modern FPGAs (Field Programmable Gate Arrays) are becoming increasingly important when it comes to embedded system development. Within these FPGAs, soft-core processors are often used to solve a wide range of different tasks. Soft-core processors are a cost-effective and time-efficient way to realize embedded systems. the trend for soft-core processors, as well as mainstream CPUs (central processing units), leads to multi-core architectures. Both the necessary memory architectures and the compilers play an important role in this process. In this paper, a novel method that aims at minimizing the necessary memory resources on the FPGA while maximizing the processing speed of any given algorithm is described. In the first step, an application-specializable multi-soft-core processor architecture is presented that is capable of solving problems while adhering to hard real-time deadlines. Its special architecture and other necessary features are discussed. Furthermore, a method for the generation of optimized machine code for each processor core as well as hard real-time compatible deadlock handling mechanisms are presented. Selected algorithms are implemented to demonstrate the functionality and efficiency of the realized approach for different configurations of the multi-soft-core processor architecture.

关键词： Microprocessors Multicore processing Memory architecture Multiprocessor interconnection Field programmable gate arrays Field programmable gate arrays Memory architecture macroprocessor parallel processor interconnection PROCESSOR soft-core Processor Core embedded system development Processor architectures Mechanical Code central processing units hard real-time

来源：评论

学校读者我要写书评

暂无评论

TAMM: A New Topology-Aware Mapping Method for parallel Applications on the Tianhe-2A Supercomputer 18th

TAMM: A New Topology-Aware Mapping Method for Parallel Appli...

引用

18th international conference on algorithms and architectures for parallel processing (ICA3PP)

作者： Chen, Xinhai Liu, Jie Li, Shengguo Xie, Peizhen Chi, Lihua Wang, Qinglin Natl Univ Def Technol Sci & Technol Parallel & Distributed Proc Lab Changsha 410073 Peoples R China Hunan Inst Traff Engn Inst Adv Sci & Technol Hengyang 421001 Peoples R China

ISBN: (纸本)9783030050511;9783030050504

With the increasing size of high performance computing systems, the expensive communication overhead between processors has become a key factor leading to the performance bottleneck. However, default process-to-processor mapping strategies do not take into account the topology of the interconnection network, and thus the distance spanned by communication messages may be particularly far. In order to enhance the communication locality, we propose a new topology-aware mapping method called TAMM. By generating an accurate description of the communication pattern and network topology, TAMM employs a two-step optimization strategy to obtain an efficient mapping solution for various parallel applications. this strategy first extracts an appropriate subset of all idle computing resources on the underlying system and then constructs an optimized one-to-one mapping with a refined iterative algorithm. Experimental results demonstrate that TAMM can effectively improve the communication performance on the Tianhe-2A supercomputer.

关键词： High performance computing systems Topology-aware mapping Communication pattern Network topology

来源：评论

学校读者我要写书评

暂无评论

Investigation of image processing techniques for glaucoma detection in human eyes 5

Investigation of image processing techniques for glaucoma de...

引用

5th international conference on parallel, Distributed and Grid Computing, PDGC 2018

作者： Kaushal, Shilpa Datt Sharma, Sunil Jain, Shruti Dept. of Electronics Comm. Engg. Jaypee University of Information Technology Distt. SolanHimachal Pradesh India

ISBN: (纸本)9781538660263

Glaucoma is a disease associated with retina of eye. Presently, millions of human being is suffering from this disease. Early detection of these diseases can save the people from blindness. therefore, various methods have been developed for its detection. In this paper, we have studied the reported methods and summarized their performance in terms of accuracy of detection. © 2018 IEEE.

关键词： Ophthalmology

来源：评论

学校读者我要写书评

暂无评论

Transcritical R744 refrigeration systems for supermarket applications: Current status and future perspectives

引用

international JOURNAL OF REFRIGERATION 2018年 93卷 269-310页

作者： Gullo, Paride Hafner, Armin Banasiak, Krzysztof NTNU Norwegian Univ Sci & Technol Dept Energy & Proc Engn Kolbjorn Hejes Vei 1D N-7491 Trondheim Norway SINTEF Energy Res Dept Thermal Energy Kolbjorn Hejes Vei 1A N-7491 Trondheim Norway

Visible signs of climate change call for urgent actions on food retail industry, since such a sector is characterized by an abundant carbon footprint. Being CO2 (or R744) recognized across the world as the most promising working fluid for supermarket applications, commercial transcritical R744 refrigeration systems have emerged as leading hydrofluorocarbon (HFC)-free technologies. this study is intended to implement an in-depth review study covering the most important aspects related to the state-of-the-art pure R744 refrigeration plants for food retail applications, including the evolution of system architectures, some field measurements, the main available results from an energy, environmental and economic perspective as well as the indispensable future investigations. It could be concluded that, in spite of some persisting barriers which still prevent such technologies from a wider adoption, the usage of R744 as the only refrigerant in supermarkets is no longer open to dispute, even in warm locations. (C) 2018 Elsevier Ltd and IIR. All rights reserved.

关键词： CO2 Commercial refrigeration system Field measurements, Multi-ejector parallel compression System integration

来源：评论

学校读者我要写书评

暂无评论

SAWS: Simple and Adaptive Warp Scheduling for Improved Performance in throughput Processors 26

SAWS: Simple and Adaptive Warp Scheduling for Improved Perfo...

引用

26th Euromicro international conference on parallel, Distributed, and Network-Based processing (PDP)

作者： Munoz-Martinez, Francisco Acacio, Manuel E. Univ Murcia Dpto Ingn & Tecnol Computadores Murcia Spain

ISBN: (纸本)9781538649756

In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a particular type of applications, SAWS considers several simple scheduling algorithms and tries to use the one that best fits each application or phase within an application. through detailed simulations we demonstrate that a practical implementation of SAWS can obtain IPC values that closely match the best scheduling algorithm in each case.

关键词： Surface acoustic waves Benchmark testing Round robin Graphics processing units throughput

来源：评论

学校读者我要写书评

暂无评论

A parallel Fast Fourier Transform Algorithm for Large-Scale Signal Data Using Apache Spark in Cloud 1

引用

18th international conference on algorithms and architectures for parallel processing (ICA3PP)

作者： Yang, Cheng Bao, Weidong Zhu, Xiaomin Wang, Ji Xiao, Wenhua Natl Univ Def Technol Changsha Peoples R China State Key Lab High Performance Comp Changsha Peoples R China Acad Mil Med Sci Beijing Peoples R China

ISBN: (数字)9783030050573

ISBN: (纸本)9783030050573;9783030050566

In the field of signal process, Fast Fourier Transform (FFT) is a widely used algorithm to transform signal data from time to frequency. Unfortunately, with the exponential growth of data, traditional methods cannot meet the demand of large-scale computation on these big data because of three main challenges of large-scale FFT, i.e., big data size, real-time data processing and high utilization of compute resources. To satisfy these requirements, an optimized FFT algorithm in Cloud is deadly needed. In this paper, we introduce a new method to conduct FFT in Cloud with the following contributions: first, we design a parallel FFT algorithm for large-scaled signal data in Cloud;second, we propose a MapReduce-based mechanism to distribute data to compute nodes using big data processing framework;third, an optimal method of distributing compute resources is implemented to accelerate the algorithm by avoiding redundant data exchange between compute nodes. the algorithm is designed in MapReduce computation framework which contains three steps: data preprocessing, local data transform and parallel data transform to integrate processing results. the parallel FFT is implemented in a 16-node Cloud to process real signal data the experimental results reveal an obvious improvement in the algorithm speed. Our parallel FFT is approximately five times faster than FFT in Matlab in when the data size reaches 10 GB.

关键词： Fast fourier transform Cloud computing Apache spark parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：