Stream Processing Systems (SPSs) can present significant fluctuation in input rate. To address this issue, some existing solutions propose reconfiguring the SPS by replicating its operators. However, such reconfigurat...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
Stream Processing Systems (SPSs) can present significant fluctuation in input rate. To address this issue, some existing solutions propose reconfiguring the SPS by replicating its operators. However, such reconfiguration usually induces a high system downtime cost. Moreover, reconfiguration decisions are based only on resource utilization without balancing the load between replicas. We propose in this paper a predictive SPS that dynamically defines the necessary number of replicas of each operator based not only on the current resource utilization and input rate variation but also on the events that, due to the operator's overloading, could not be processed yet and are, thus, kept in the operator's queue. In addition, our SPS implements a load balancer that distributes incoming events more evenly among replicas of an operator. Our solution has been integrated into Storm. To avoid system reconfiguration downtime, our SPS preallocates a pool of replicas where each of them can be activated or deactivated based on per operator input load predictions. Using real traffic traces with different applications, we have conducted experiments on Google Cloud Platform (GCP), evaluating our SPS and comparing it with Storm and DABS-Storm.
For centuries, theory and experiment have served as the two pillars of science that underpin our understanding of the physical and natural world. Scientists collect measurements and make observations to, in turn, cons...
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
For centuries, theory and experiment have served as the two pillars of science that underpin our understanding of the physical and natural world. Scientists collect measurements and make observations to, in turn, construct theories to describe, explain, and predict the phenomena that occur in the world. More recently, computing has evolved as the third pillar of science that “fills the gap” between theory and experiment by simulating theory (and validating experiment), thus accelerating discoveries that would otherwise take much longer to discover and with significantly more personnel resources, e.g., the Human Genome Project. Today, while computing continues to accelerate discoveries by simulating theory, the doubling of bio{logical | medical | health | informatics} data is far outstripping our ability to compute on the data. Thus, the computing pillar of science must evolve to not only simulate theory (and be validated by experiment), but to also be leveraged to tackle and intelligently mine the “biodata deluge” via traditional data analytics as well as via machine learning and deep learning. In short, we need to not only compute faster but also smarter. As such, this talk re-visits (bio)computing, the third pillar of science, and its evolution into a (bio)computing mosaic, which I refer to here as synergistic (bio)computing.
As the availability of SAR images continues to grow, efficient coregistration of massive SAR images presents a greater challenge. Traditional serial coregistration methods impose an unbearable time overhead. To reduce...
As the availability of SAR images continues to grow, efficient coregistration of massive SAR images presents a greater challenge. Traditional serial coregistration methods impose an unbearable time overhead. To reduce this overhead and make full use of computing resources, a parallel coregistration strategy based on Hadoop is proposed for SAR images. The Hadoop distributed File System (HDFS) is used to store SAR image data in chunks, and Hadoop's distributedcomputing strategy MapReduce is used to realize distributedparallel processing of SAR images. Two distributedparallel coregistration methods are presented with the proposed parallel strategy: one based on the maximum correlation method and the other on the DEM-assisted coregistration method. These methods are evaluated through coregistration experiments on the same dataset, and they are verified by comparing the coregistration results and processing time.
Automatic Term Recognition is used to extract domain-specific terms that belong to a given domain. In order to be accurate, these corpus and language-dependent methods require large volumes of textual data that need t...
Automatic Term Recognition is used to extract domain-specific terms that belong to a given domain. In order to be accurate, these corpus and language-dependent methods require large volumes of textual data that need to be processed to extract candidate terms that are afterward scored according to a given metric. To improve text preprocessing and candidate terms extraction and scoring, we propose a distributed Spark-based architecture to automatically extract domain-specific terms. The main contributions are as follows: (1) propose a novel distributed automatic domain-specific multi-word term recognition architecture built on top of the Spark ecosystem; (2) perform an in-depth analysis of our architecture in terms of accuracy and scalability; (3) design an easy-to-integrate Python implementation that enables the use of Big Data processing in fields such as Computational Linguistics and Natural Language Processing. We prove empirically the feasibility of our architecture by performing experiments on two real-world datasets.
In this talk, I will be presenting our on-going project entitled Quantum-Annealing Assisted Next-Generation HPC Infrastructure. In this project, we try to realize transparent accesses to not only classical HPC resourc...
In this talk, I will be presenting our on-going project entitled Quantum-Annealing Assisted Next-Generation HPC Infrastructure. In this project, we try to realize transparent accesses to not only classical HPC resources with heterogeneous computing platforms such as x86 and vector accelerator, but also Quantum computing one in a unified fashion. We are also developing next-generation applications in the fields of computational science, data sciences and their fusions best suited for this infrastructure. The target applications are three digital twins: Digital Twin for Disaster Resilience, Digital Twin for Stable Power Generation and Digital Twin for Soft-Material Design. In these applications developments, we introduce quantum annealing and simulated annealing accelerated by classical HPC into optimal evacuation route analysis and data clustering, respectively. Some performance discussion on different type of annealers by quantum and classical computing is also presented in this talk.
With the emergence of the Internet of Things (IoT), the Cloud computing (CC) paradigm has started to exhibit several limitations caused by the fact that it is too far away from the tens of billions of devices. Thus, t...
With the emergence of the Internet of Things (IoT), the Cloud computing (CC) paradigm has started to exhibit several limitations caused by the fact that it is too far away from the tens of billions of devices. Thus, the edge computing paradigm has come to the fore, aiming to move the processing power to the edge of the network closer to the data sources. Furthermore, the fog computing paradigm comes as an addition, providing Cloud services decentralized at the geographic region level. This trend of migrating from a centralized to a decentralized approach gives way to a new paradigm, Drop computing (DC), that involves creating opportunistic decentralized ad-hoc social networks consisting of edge and mobile devices. The main goal of this paper is to provide a scheduling model for the Drop computing paradigm, both at the local end device level and at the global ad-hoc network level. We implement various scheduling solutions using the proposed model and, through simulations in the mobile network environment, we analyze their behavior in multiple scenarios to understand the requirements and outcomes of scheduling in DC.
Modern high-performance computing (HPC) and cloud computing systems are integrating powerful GPUs to accelerate increasingly demanding deep learning workloads. To improve cluster efficiency and better understand user ...
详细信息
ISBN:
(数字)9798350387117
ISBN:
(纸本)9798350387124
Modern high-performance computing (HPC) and cloud computing systems are integrating powerful GPUs to accelerate increasingly demanding deep learning workloads. To improve cluster efficiency and better understand user behavior and job characteristics, system operators will collect operational data for trace analysis. However, previous efforts on these system logs have lacked the interpretability aspect, and there is no systematic approach that can be widely applied to different datacenter traces and return interpretable results. In this work, we propose a workflow to discover hidden association relation-ships between collected features of system jobs. The outcome of our analysis approach yields useful association rules that can be directly interpreted into operational insights. Using this approach, we have conducted case studies using the traces of three large-scale multi-tenant GPU clusters running production machine learning workloads. We have focused on the observations of GPU underutilization and job failures, revealing the possible reasons for these job behaviors and suggesting solutions to mitigate them. Our case studies have demonstrated the feasibility of our interpretable analysis workflow, which can be widely used by more HPC and cloud computing system operators.
The architecture based on global-local parameter servers (PSs) is an appealing framework to accelerate the training of the global model in distributed deep learning. However, the accuracy and training time are influen...
详细信息
ISBN:
(纸本)9781665435741
The architecture based on global-local parameter servers (PSs) is an appealing framework to accelerate the training of the global model in distributed deep learning. However, the accuracy and training time are influenced by the placement strategy of PSs in distributed deep learning, due to the adopted gradient descent methods and the heterogeneous computation and communication resources. Therefore, this paper formulates a novel problem for placement strategy of PSs in the dynamic available storage capacity, with the objective of minimizing the training time of the distributed deep learning under the constraints of the storage capacity and the number of local PSs. Then, we prove the NP-hardness of the proposed problem. To solve the problem, a parameter servers placement algorithm, followed by an adjustment algorithm, is proposed in this paper, by continuously making decisions for placement strategy of PSs to decrease the training time of the global model. Simulation results show that, the proposed combined algorithm performs better than existing works for all cases, in terms of the training time of global model. Moreover, the performance of the proposed algorithm is nearly close to that of brute-force approach, in terms of the training time of the global model.
We present our development of load balancing algorithms to efficiently distribute and parallelize the running of large-scale complex agent-based modeling (ABM) simulators on High-Performance computing (HPC) resources....
详细信息
ISBN:
(纸本)9781665435772
We present our development of load balancing algorithms to efficiently distribute and parallelize the running of large-scale complex agent-based modeling (ABM) simulators on High-Performance computing (HPC) resources. Our algorithm is based on partitioning the co-location network that emerges from an ABM's underlying synthetic population. Variations of this algorithm are experimentally applied to investigate algorithmic choices on two factors that affect run-time performance. We report these experiments' results on the CityCOVID ABM, built to model the spread of COVID-19 in the Chicago metropolitan region.
Fully Connected Neural Network (FCNN) are widely used in image recognition and natural language processing. However, the time cost of training large datasets is high. Optical Network-on Chip (ONoC) has been proposed t...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Fully Connected Neural Network (FCNN) are widely used in image recognition and natural language processing. However, the time cost of training large datasets is high. Optical Network-on Chip (ONoC) has been proposed to accelerate the parallelcomputing of FCNN because of its advantages. Therefore, this paper proposes an accelerated FCNN model based on ONoC. We first design an FCNN-aware mapping strategy, and then propose a group-based inter-core communication scheme with low wavelength requirements according to the distribution of mapping cores. The optimal number of cores in each period is obtained by achieving the trade-off between the communication and computation time. The simulation results show that the proposed scheme has the advantages of low wavelength requirement, short training time and good scalability.
暂无评论