The energy consumption of modern data centers is currently growing rapidly, due to the development of new services and the spread of the ICT (Information & Communication Technologies) in all areas of human life. I...
详细信息
ISBN:
(纸本)9781728171272
The energy consumption of modern data centers is currently growing rapidly, due to the development of new services and the spread of the ICT (Information & Communication Technologies) in all areas of human life. It causes a huge demand in new energy efficient computing technologies which would simultaneously improve data processing performance meeting the requirements of SLA.
this paper introduces a simple method of building a prototype cluster of the Raspberry Pi. The Pi cluster is a powerful-low -cost tool for teaching the complex concepts of parallel and distributed computing to undergr...
详细信息
ISBN:
(纸本)9781728153179
this paper introduces a simple method of building a prototype cluster of the Raspberry Pi. The Pi cluster is a powerful-low -cost tool for teaching the complex concepts of parallel and distributed computing to undergraduate E&CS students. The performance of the presented Pi cluster is assessed using two different applications of face recognition and image encryption, which are computationally expensive. The paper explains how to compare the performance of the Pi cluster against a traditional high performance traditional cluster. The comparison is designed to help undergraduate students understand the state of the art of the clusters. The paper explains how the Pi clusters fit in the CS curriculum at Old Dominion University. The presented project-based learning is an effective teaching approach that helps the learning of struggling engineering/computer science students from minorities and women groups at ODU.
Executing complicated computations in parallel increases the speed of computing and brings user delight to the system. Decomposing the program into several small programs and running multiple parallel processors are m...
详细信息
ISBN:
(纸本)9781728110516
Executing complicated computations in parallel increases the speed of computing and brings user delight to the system. Decomposing the program into several small programs and running multiple parallel processors are modeled by Directed Acyclic Graph. Scheduling nodes to execute this task graph is an important problem that will speed up computations. Since task scheduling in this graph belongs to NP-hard problems, various algorithms were developed for node scheduling to contribute to quality service delivery. The present study brought a heuristic algorithm named looking ahead sequencing algorithm (LASA) to cope with static scheduling in heterogeneous distributed computing systems with the intention of minimizing the schedule length of the user application. In the algorithm proposed here, looking ahead is considered as a criterion for prioritizing tasks. Also, a property called Emphasized Processor has been added to the algorithm to emphasize the task execution on a particular processor. The effectiveness of the algorithm was shown on few workflow type applications and the results of the algorithm implementation were compared with two more heuristic and meta-heuristic algorithms.
The aim of the work was to determine the relationship between the workload of the microcontroller measuring the electrocardiogram signals and the data stream from the microcontroller to the central server via the Inte...
详细信息
ISBN:
(数字)9781728160726
ISBN:
(纸本)9781728160726
The aim of the work was to determine the relationship between the workload of the microcontroller measuring the electrocardiogram signals and the data stream from the microcontroller to the central server via the Internet, depending on the distribution of computational operations between them. The electrocardiogram signals were processed in a sequence of independently controlled digital filters, which gradually led to a decrease in data flow. The paper presents the results of modeling the operation of digital electrocardiogram signal processing algorithms for harvard architecture 8-bit RISC single-chip microcontroller in MatLAB environment with an estimation of time computational complexity. Estimates of the computational complexity of each of the stages of signal processing, the load of the microcontroller that implements these stages, as well as estimates of the change in the data stream at the output of the microcontroller when implementing a complete or reduced sequence of stages are given.
In this work, we propose the use of hybrid offloading of computing tasks simultaneously to edge servers (vertical of-floading) via LIE communication and to nearby cars (horizontal offloading) via V2V communication, in...
详细信息
ISBN:
(纸本)9781728182988
In this work, we propose the use of hybrid offloading of computing tasks simultaneously to edge servers (vertical of-floading) via LIE communication and to nearby cars (horizontal offloading) via V2V communication, in order to increase the rate at which tasks are processed compared to local processing. Our main contribution is an optimized resource assignment and scheduling framework for hybrid offloading of computing tasks. The framework optimally utilizes the computational resources in the edge and in the micro cloud, while taking into account communication constraints and task requirements. While cooperative perception is the primary use case of our framework, the framework is applicable to other cooperative vehicular applications with high computing demand and significant transmission overhead. The framework is tested in a simulated environment built on top of car traces and communication rates exported from the Veins vehicular networking simulator. We observe a significant increase in the processing rate of cooperative perception sensor frames when hybrid offloading with optimized resource assignment is adopted. Furthermore, the processing rate increases with V2V connectivity as more computing tasks can be offloaded horizontally.
In Internet-of-Thing (IoT) networks, enormous low-power IoT devices execute latency-sensitive yet computation-intensive machine learning tasks. However, the energy is usually scarce for IoT devices, especially for som...
详细信息
ISBN:
(纸本)9781728159430
In Internet-of-Thing (IoT) networks, enormous low-power IoT devices execute latency-sensitive yet computation-intensive machine learning tasks. However, the energy is usually scarce for IoT devices, especially for some without battery and relying on solar power or other renewables forms. In this paper, we introduce a cross-layer optimization framework for distributed computing among low-power IoT devices. Specifically, a programming layer design for distributed IoT networks is presented by addressing the problems of application partition, task scheduling, and communication overhead mitigation. Furthermore, the associated federated learning and local differential privacy schemes are developed in the communication layer to enable distributed machine learning with privacy preservation. In addition, we illustrate a three-dimensional network architecture with various network components to facilitate efficient and reliable information exchange among IoT devices. Moreover, a model quantization design for IoT devices is illustrated to reduce the cost of information exchange. Finally, a parallel and scalable neuromorphic computing system for IoT devices is established to achieve energy-efficient distributed computing platforms in the hardware layer. Based on the introduced cross-layer optimization framework, IoT devices can execute their machine learning tasks in an energy-efficient way while guaranteeing data privacy and reducing communication costs.
Modern pace of Information and Communication Technologies (ICT) progress requires rapid development of completely new approaches to data processing. One of the most important criteria that should be taken into account...
详细信息
ISBN:
(纸本)9781728171272
Modern pace of Information and Communication Technologies (ICT) progress requires rapid development of completely new approaches to data processing. One of the most important criteria that should be taken into account in every computing system nowadays is its energy efficiency. Strict requirements to service availability and timeliness must be taken into account as well. In this paper new comprehensive energy efficient approach to workload processing in distributed computing environment is proposed. The goal of this approach is to get as close as possible to the energy-proportional computing model. Proposed approach combines the advantages of horizontal scaling and energy efficient scheduling taking into account individual power consumption characteristics of computing nodes and dynamicity of workload in modern computing systems. The efficiency of the proposed approach is proven using Matlab modeling.
Process mining aims at supporting the understanding of business processes. To this end, information is extracted from event logs in an automated fashion using machine learning methods. Large-scale machine learning met...
详细信息
Process mining aims at supporting the understanding of business processes. To this end, information is extracted from event logs in an automated fashion using machine learning methods. Large-scale machine learning methods allow handling massive volumes of event log data without the need for costly human (expert) labor. This dissertation studies efficient methods for large scale machine learning problems arising within process mining and predictive process analytics.
Machine learning is a research area examining techniques for allowing a computer to learn from past data and create a mathematical model based on it. A common application of machine learning in process mining is the continuous forecasting of events within long-term business processes. This dissertation presents a method for performing structural feature selection from process instances. The performances of different feature selection techniques are compared using a gradient boosting machine (GBM) as a benchmark classification method for binary classification tasks. The best results were achieved by k-means clustering-based feature selection algorithm developed in the dissertation.
An alternative to combining explicit feature selection with standard classification methods (such as GBM) is to feed raw data into a deep neural network. Deep neural networks perform the feature selection implicitly during the training process. Since event logs have an intrinsic temporal ordering, recurrent neural networks (RNN) are a popular choice for deep learning methods in process mining. It is found out that RNNs using gated recurrent unit (GRU) are favorable compared to long short-term memory (LSTM) network structure for this task. This dissertation also presents a novel method for efficiently encoding event attribute data into input vectors used to train RNN models which provides a user-configurable trade-off between the prediction accuracy and the time needed for model training and prediction.
Complementary to the d
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing job...
详细信息
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing jobs per day. ATLAS - the largest experiment at the LHC - creates an enormous flow of data which has to be recorded and analyzed by a complex heterogeneous and distributed computing environment. Statistically, about 10-12% of computing jobs end with a failure: network faults, service failures, authorization failures, and other error conditions trigger error messages which provide detailed information about the issue, which can be used for diagnosis and proactive fault handling. However, this analysis is complicated by the sheer scale of textual log data, and often exacerbated by the lack of a well-defined structure: human experts have to interpret the detected messages and create parsing rules manually, which is time-consuming and does not allow identifying previously unknown error conditions without further human intervention. This paper is dedicated to the description of a pipeline of methods for the unsupervised clustering of multi-source error messages. The pipeline is data-driven, based on machine learning algorithms, and executed fully automatically, allowing categorizing error messages according to textual patterns and meaning.
This paper introduces a new approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT sourc...
详细信息
ISBN:
(纸本)9783031125973;9783031125966
This paper introduces a new approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source transformation of Python programs, driven by the inclusion of type hints for function parameters and return values. These hints can be supplied by the programmer or obtained by dynamic profiler tools;multi-version code generation guarantees the correctness of our AOT transformation in all cases. Our compilation framework performs automatic parallelization and sophisticated high-level code optimizations for the target distributed heterogeneous hardware platform. It introduces novel extensions to the polyhedral compilation framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated selection of CPU vs. GPU code variants. Finally, output parallelized code generated by our approach is deployed using the Ray runtime for scheduling distributed tasks across multiple heterogeneous nodes in a cluster, thereby enabling both intra-node and inter-node parallelism. Our empirical evaluation shows significant performance improvements relative to sequential Python in both single-node and multi-node experiments, with a performance improvement of over 20,000x when using 24 nodes and 144 GPUs in the OLCF Summit supercomputer for the Space-Time Adaptive Processing (STAP) radar application.
暂无评论