Emerging general purpose graphics processing units (GPGPU) make use of a memory hierarchy very similar to that of modern multi-core processors they typically have multiple levels of on-chip caches and a DDR-like off-c...
详细信息
ISBN:
(纸本)9781728165820
Emerging general purpose graphics processing units (GPGPU) make use of a memory hierarchy very similar to that of modern multi-core processors they typically have multiple levels of on-chip caches and a DDR-like off-chip main memory. In such massively parallelarchitectures, caches are expected to reduce the average data access latency by reducing the number of off-chip memory accesses;however, our extensive experimental studies confirm that not all applications utilize the on-chip caches in an efficient manner. Even though GPGPUs are adopted to run a wide range of general purpose applications, the conventional cache management policies are incapable of achieving the optimal performance over different memory characteristics of the applications. this paper first investigates the underlying reasons for inefficiency of common cache management policies in GPGPUs. To address and resolve those issues, we then propose (i) a characterization mechanism to analyze each kernel at runtime and, (ii) a selective caching policy to manage the flow of cache accesses. Evaluation results of the studied platform show that our proposed dynamically reconfigurable cache hierarchy improves the system performance by up to 105% (average of 27%) over a wide range of modern GPGPU applications, which is within 10% of the optimal improvement.
Deep Reinforcement learning (DRL) algorithms recently still take a long time to train models in many applications. parallelization has the potential to improve the efficiency of DRL algorithms. In this paper, we propo...
详细信息
ISBN:
(纸本)9783030389611;9783030389604
Deep Reinforcement learning (DRL) algorithms recently still take a long time to train models in many applications. parallelization has the potential to improve the efficiency of DRL algorithms. In this paper, we propose an parallel approach (ParaA2C) for the popular Actor-Critic (AC) algorithms in DRL, to accelerate the training process. Our work considers the parallelization of the basic advantage actor critic (Serial-A2C) in AC algorithms. Specifically, we use multiple actor-learners to mitigate the strong correlation of data and the instability of updating, and finally reduce the training time. Note that we assign each actor-learner MPI process to a CPU core, in order to prevent resource contention between MPI processes, and make our ParaA2C approach more scalable. We demonstrate the effectiveness of ParaA2C by performing on Arcade Learning Environment (ALE) platform. Notably, our ParaA2C approach takes less than 10 min to train in some commonly used Atari games when using 512 CPU cores.
Breast cancer is the most leading cancer among women. Usually, pathologists have to examine the histological image slides through the whole slides tissues in different magnifications, to extract the tumor malignancy t...
详细信息
ISBN:
(纸本)9781728187501
Breast cancer is the most leading cancer among women. Usually, pathologists have to examine the histological image slides through the whole slides tissues in different magnifications, to extract the tumor malignancy then the tumor grade. these image's interpretation is one of the time and effort consuming task to define an accurate diagnosis. Consequently, Computer-Aided Diagnosis (CAD) systems are highly demanded. However, the histological images have pervasive variability, which is a big challenge due to the variation of tissue textures and which is hard to be interpreted by the computer. For this, deep learning algorithms have been promised architectures for complex objects, but the problem of the low resource of datasets is still yet a constraint to build an efficient medical system for image classification. In this work, we propose a solution based on combining two different datasets for breast cancer grade detection. Our proposed method is about adding a new class (grade 0) to the three known classes of breast cancer grades, which make our model detect boththe malignancy and the grade of the breast tumors. Furthermore, both datasets images have the same magnification factor which helps our models in avoiding overfitting problems. Our models are trained using two different convolutional neural network architectures, the ResNet50 and the MobileNet for comparing between a lightweight and heavyweight architectures. the obtained results show the best accuracy in the state-of-the-art.
Recently distributed computing capacities are brought to the edge of the Internet, permitting Internet-of-things applications to process calculation all the more locally and subsequently more productively and this has...
详细信息
ISBN:
(数字)9783030723699
ISBN:
(纸本)9783030723682;9783030723699
Recently distributed computing capacities are brought to the edge of the Internet, permitting Internet-of-things applications to process calculation all the more locally and subsequently more productively and this has brought a totally different scope of apparatuses and usefulness. this instrument can the most significant characterizing highlights of edge processing are low latency, location awareness, wide geographic distribution, versatility, support for countless nodes, etc. We want to likely limit the latency and delay in edge-based structures. We center around a progressed compositional setting that considers communication and processing delays and the management effort notwithstanding a real request execution time in an operational efficiency situation. Our design is based on multi-cluster edge layer with nearby autonomous edge node clusters. We will contend that particle swarm optimization as a bio-motivated optimization approach is a perfect candidate for distributed IoT load handling in self-managed edge clusters. By designing a controller and utilizing a particle swarm optimization algorithm, we show that delay and end-to-end latency can be reduced.
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning s...
ISBN:
(纸本)9783030050535
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning system;Toward performance prediction for multi-BSP programs in ML;exploiting the table of energy and power leverages;a semantic web based intelligent IoT model;Accelerating CNNs using optimized scheduling strategy;data analysis of blended learning in python programming;APs deployment optimization for indoor fingerprint positioning with adaptive particle swarm algorithm;deployment optimization of indoor positioning signal sources with fireworks algorithm;a study of sleep stages threshold based on multiscale fuzzy entropy;qoS-driven service matching algorithm based on user requirements;Blind estimation algorithm over fast-fading multipath OFDM channels;facial shape and expression transfer via non-rigid image deformation;p-schedule: Erasure coding schedule strategy in big data storage system;Answer aggregation of crowdsourcing employing an improved EM-based approach;a parallel fast fourier transform algorithm for large-scale signal data using apache spark in cloud;task offloading in edge-clouds with budget constraint;motion trajectory sequence-based map matching assisted indoor autonomous mobile robot positioning;towards the independent spanning trees in the line graphs of interconnection networks;POEM: Pricing longer for edge computing in the device cloud;mobility analysis and response for software-defined internet of things;Research on overload classification method for bus images based on image processing and SVM;DStore: A distributed cloud storage system based on smart contracts and blockchain;towards an efficient and real-time scheduling platform for mobile charging vehicles;Streaming ETL in polystore era.
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning s...
ISBN:
(纸本)9783030050566
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning system;Toward performance prediction for multi-BSP programs in ML;exploiting the table of energy and power leverages;a semantic web based intelligent IoT model;Accelerating CNNs using optimized scheduling strategy;data analysis of blended learning in python programming;APs deployment optimization for indoor fingerprint positioning with adaptive particle swarm algorithm;deployment optimization of indoor positioning signal sources with fireworks algorithm;a study of sleep stages threshold based on multiscale fuzzy entropy;qoS-driven service matching algorithm based on user requirements;Blind estimation algorithm over fast-fading multipath OFDM channels;facial shape and expression transfer via non-rigid image deformation;p-schedule: Erasure coding schedule strategy in big data storage system;Answer aggregation of crowdsourcing employing an improved EM-based approach;a parallel fast fourier transform algorithm for large-scale signal data using apache spark in cloud;task offloading in edge-clouds with budget constraint;motion trajectory sequence-based map matching assisted indoor autonomous mobile robot positioning;towards the independent spanning trees in the line graphs of interconnection networks;POEM: Pricing longer for edge computing in the device cloud;mobility analysis and response for software-defined internet of things;Research on overload classification method for bus images based on image processing and SVM;DStore: A distributed cloud storage system based on smart contracts and blockchain;towards an efficient and real-time scheduling platform for mobile charging vehicles;Streaming ETL in polystore era.
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning s...
ISBN:
(纸本)9783030050504
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning system;Toward performance prediction for multi-BSP programs in ML;exploiting the table of energy and power leverages;a semantic web based intelligent IoT model;Accelerating CNNs using optimized scheduling strategy;data analysis of blended learning in python programming;APs deployment optimization for indoor fingerprint positioning with adaptive particle swarm algorithm;deployment optimization of indoor positioning signal sources with fireworks algorithm;a study of sleep stages threshold based on multiscale fuzzy entropy;qoS-driven service matching algorithm based on user requirements;Blind estimation algorithm over fast-fading multipath OFDM channels;facial shape and expression transfer via non-rigid image deformation;p-schedule: Erasure coding schedule strategy in big data storage system;Answer aggregation of crowdsourcing employing an improved EM-based approach;a parallel fast fourier transform algorithm for large-scale signal data using apache spark in cloud;task offloading in edge-clouds with budget constraint;motion trajectory sequence-based map matching assisted indoor autonomous mobile robot positioning;towards the independent spanning trees in the line graphs of interconnection networks;POEM: Pricing longer for edge computing in the device cloud;mobility analysis and response for software-defined internet of things;Research on overload classification method for bus images based on image processing and SVM;DStore: A distributed cloud storage system based on smart contracts and blockchain;towards an efficient and real-time scheduling platform for mobile charging vehicles;Streaming ETL in polystore era.
the proceedings contain 15 papers. the special focus in this conference is on Latin American High Performance Computing. the topics include: Electricity Demand Forecasting Using Computational Intelligence and High Per...
ISBN:
(纸本)9783030680343
the proceedings contain 15 papers. the special focus in this conference is on Latin American High Performance Computing. the topics include: Electricity Demand Forecasting Using Computational Intelligence and High Performance Computing;parallel/Distributed Generative Adversarial Neural Networks for Data Augmentation of COVID-19 Training Images;analysis of Regularization in Deep Learning Models on Testbed architectures;computer Application for the Detection of Skin Diseases in Photographic Images Using Convolutional Neural Networks;Neocortex and Bridges-2: A High Performance AI+HPC Ecosystem for Science, Discovery, and Societal Good;Fostering Remote Visualization: Experiences in Two Different HPC Sites;high Performance Computing Simulations of Granular Media in Silos;performance Analysis of Main Public Cloud Big Data Services processing Brazilian Government Data;accelerating Machine Learning algorithms with TensorFlow Using thread Mapping Policies;Methodology for Design and Implementation an Efficient HPC Cluster;estimating the Execution Time of the Coupled Stage in Multiscale Numerical Simulations;Using HPC as a Competitive Advantage in an international Robotics Challenge;a Survey on Privacy-Preserving Machine Learning with Fully Homomorphic Encryption.
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning s...
ISBN:
(纸本)9783030050627
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning system;Toward performance prediction for multi-BSP programs in ML;exploiting the table of energy and power leverages;a semantic web based intelligent IoT model;Accelerating CNNs using optimized scheduling strategy;data analysis of blended learning in python programming;APs deployment optimization for indoor fingerprint positioning with adaptive particle swarm algorithm;deployment optimization of indoor positioning signal sources with fireworks algorithm;a study of sleep stages threshold based on multiscale fuzzy entropy;qoS-driven service matching algorithm based on user requirements;Blind estimation algorithm over fast-fading multipath OFDM channels;facial shape and expression transfer via non-rigid image deformation;p-schedule: Erasure coding schedule strategy in big data storage system;Answer aggregation of crowdsourcing employing an improved EM-based approach;a parallel fast fourier transform algorithm for large-scale signal data using apache spark in cloud;task offloading in edge-clouds with budget constraint;motion trajectory sequence-based map matching assisted indoor autonomous mobile robot positioning;towards the independent spanning trees in the line graphs of interconnection networks;POEM: Pricing longer for edge computing in the device cloud;mobility analysis and response for software-defined internet of things;Research on overload classification method for bus images based on image processing and SVM;DStore: A distributed cloud storage system based on smart contracts and blockchain;towards an efficient and real-time scheduling platform for mobile charging vehicles;Streaming ETL in polystore era.
暂无评论