Conventional distributed system courses follow a syllabus in which a list of topics is discussed independently and at different levels of abstractions. We propose to use a wireless sensor network environment to pin al...
详细信息
ISBN:
(纸本)9780769546766
Conventional distributed system courses follow a syllabus in which a list of topics is discussed independently and at different levels of abstractions. We propose to use a wireless sensor network environment to pin all topics down to concrete applications and to maintain issues such as fault tolerance and coordination continuously present. We describe a syllabus with eight conceptual modules, each of them associated to a hands-on experience with wireless sensor networks, which may be assigned either as homework or as a hands-on class, depending on the number of classroom hours that are available.
Memory caching has long been used to fill up the performance gap between processor and disk for reducing the data access time of data-intensive computations. Previous studies on caching mostly focus on optimizing the ...
详细信息
ISBN:
(纸本)9781665497473
Memory caching has long been used to fill up the performance gap between processor and disk for reducing the data access time of data-intensive computations. Previous studies on caching mostly focus on optimizing the hit rate of a single machine. But in this paper, we argue that the caching decision of a distributed memory system should be performed in a cooperative manner for the parallel data analytic applications, which are commonly used by emerging technologies, such as Big Data and AI (Artificial Intelligence), to perform data mining and sophisticated analytics on larger data volume in a shorter time. A parallel data analytic job consists of multiple parallel tasks. Hence, the completion time of a job is bounded by its slowest task, meaning that the job cannot benefit from caching until all inputs of its tasks are cached. To address the problem, we proposed a cooperative caching design that periodically rearranges the cache placement among nodes according to the data access pattern while taking the task dependency and network locality into account. Our approach is evaluated by a trace-driven simulator using both synthetic workload and real-world traces. the results show that we can reduce the average completion times up to 33% compared to a non-collaborative caching polices and 25% compared to other start-of-the-art collaborative caching policies.
Development of a decent parallel simulator is challenging work. It should achieve enough performance, scalability and fault tolerance. Our proposal is utilizing general-purpose data processing engines such as MapReduc...
详细信息
ISBN:
(纸本)9781509035052
Development of a decent parallel simulator is challenging work. It should achieve enough performance, scalability and fault tolerance. Our proposal is utilizing general-purpose data processing engines such as MapReduce implementations for parallel simulation. Widely used and mature engines take away a large part of the development effort and support scalability and fault tolerance. We demonstrate that a parallel discrete-event simulator can be implemented on such engines, Apache Hadoop and Apache Spark, by modeling message passing of distributed systems on MapReduce key-value processing model. Implemented simulators could handle 10 8 nodes with 10 computers. Preliminary evaluation showed that our Spark-based simulator is about 20 times as fast as an existing simulator thanks to Time Warp.
Recent years have witnessed the rapid growth of smart devices and mobile applications. However, mobile applications are typically computation-intensive and delay-sensitive, while User Devices (UDs) are usually resourc...
详细信息
ISBN:
(纸本)9781665435741
Recent years have witnessed the rapid growth of smart devices and mobile applications. However, mobile applications are typically computation-intensive and delay-sensitive, while User Devices (UDs) are usually resource-limited. Mobile Edge Computing (MEC) has been proposed as a promising paradigm to mitigate the tension, where UDs' tasks could be executed either locally on itself or remotely on the edge server via computation offloading. Lots of efficient computation offloading scheduling approaches have been proposed, whereas most of them are based on centralized scheduling which could face troubles in large-scale MEC. To address the issue, this paper proposes a distributed scheduling framework by leveraging the idea of `centralized training and distributed scheduling'. Furthermore, the Actor-Critic reinforcement learning is adopted to build the framework where the Actor and Critic play the roles of distributed scheduling and centralized training, respectively. Extensive simulations are conducted and the experimental results verify the effectiveness and efficiency of the proposed framework.
Functional safety is aimed at avoiding unacceptable risks and safety damages due to system functional failures, and it is a critical demand for the automotive embedded systems. For safety-critical distributed automoti...
详细信息
ISBN:
(纸本)9781538637906
Functional safety is aimed at avoiding unacceptable risks and safety damages due to system functional failures, and it is a critical demand for the automotive embedded systems. For safety-critical distributed automotive functions, reliability is an important functional safety requirement and reliability goal should be assured. In general, the key of reliability goal assurance method is to transfer the reliability goal of a distributed function to that of each task. this study proposes an effective reliability goal assurance method called RGAGM for automotive functional safety. the core idea of this method is defining two kinds of geometric mean for tasks and function, respectively, and preassigning geometric mean-based reliability values for unassigned tasks, thereby saving more resources for systems. the correctness of the proposed RGAGM method is proved. Experiment results on the real-life automotive function and the randomly generated distributed automotive functions show that the proposed method can effectively ensure the reliability goal and reduce resource consumption cost compared withthe stateof-the-art MRCRG method.
Microprocessor design space exploration is an inevitable stage in the early stages of microprocessor design. In work [1], a critical path analysis based design space exploration method is proposed. Critical path analy...
详细信息
ISBN:
(纸本)9781728111414
Microprocessor design space exploration is an inevitable stage in the early stages of microprocessor design. In work [1], a critical path analysis based design space exploration method is proposed. Critical path analysis on the instruction dependence graph is often used in the research of the micro-architecture of the instruction pipeline of the microprocessor. Previous analysis method must process the huge log file serially and the analysis time was very long. In this paper, a parallel analysis algorithm based on multithreading was presented. By partitioning the log file into multiple blocks and using multiple threads to process them in parallel, this algorithm achieved a nearly linear speedup according to the number of thread.
the ever-increasing supercomputer architectural complexity emphasizes the need for high-level parallel programming paradigms. Among such paradigms, task-based programming manages to abstract away much of the architect...
详细信息
ISBN:
(纸本)9781509036820
the ever-increasing supercomputer architectural complexity emphasizes the need for high-level parallel programming paradigms. Among such paradigms, task-based programming manages to abstract away much of the architecture complexity while efficiently meeting the performance challenge, even at large scale. Dynamic run-time systems are typically used to execute task-based applications, to schedule computation resource usage and memory allocations. While computation scheduling has been well studied, the dynamic management of memory resource subscription inside such run-times has however been little explored. this paper studies the cooperation between a task-based distributed application code and a run-time system engine to control the memory subscription levels throughout the execution. We show that the task paradigm allows to control the memory footprint of the application by throttling the task submission flow rate, striking a compromise between the performance benefits of anticipative task submission and the resulting memory consumption. We illustrate the benefits of our contribution on a compressed dense linear algebra distributed application.
the Euler tour technique is a classical tool for designing parallel graph algorithms, originally proposed for the PRAM model. We ask whether it can be adapted to run efficiently on GPU. We focus on two established app...
详细信息
ISBN:
(纸本)9781665440660
the Euler tour technique is a classical tool for designing parallel graph algorithms, originally proposed for the PRAM model. We ask whether it can be adapted to run efficiently on GPU. We focus on two established applications of the technique: (1) the problem of finding lowest common ancestors (LCA) of pairs of nodes in trees, and (2) the problem of finding bridgis in undirected graphs. In our experiments, we compare theoretically optimal algorithms using the Euler tour technique against simpler heuristics supposed to perform particularly well on typical instances. We show that the Euler tour-based algorithms not only fulfill their theoretical promises and outperform practical heuristics on hard instances, but also perform on par withthem on easy instances.
We present a novel trace-based analysis tool that rapidly classifies an MPI application as bandwidth-bound, latency-bound, load-imbalance-bound, or computation-bound for different interconnection networks. the tool us...
详细信息
ISBN:
(纸本)9781509021406
We present a novel trace-based analysis tool that rapidly classifies an MPI application as bandwidth-bound, latency-bound, load-imbalance-bound, or computation-bound for different interconnection networks. the tool uses an extension of Lamport's logical clock to track application progress in the trace replay. It has two unique features. First, it predicts application performance for many latency and bandwidth parameters from a single replay of the trace. Second, it infers the performance characteristics of an application and classifies the application using the predicted performance trend for a range of network configurations instead of using the predicted performance for a particular network configuration. We describe the techniques used in the tool and its design and implementation, and report our performance study of the tool and our experience with classifying nine applications and mini-apps from the DOE Design Forward project as well as the NAS parallel Benchmarks.
the growing complexity of VLSI designs demands for continuous performance improvement of Electronic Design Automation (EDA) applications. Tradionally, part of this performance delta has been reached by leveraging the ...
详细信息
ISBN:
(纸本)9781538653302
the growing complexity of VLSI designs demands for continuous performance improvement of Electronic Design Automation (EDA) applications. Tradionally, part of this performance delta has been reached by leveraging the improvements in the single threaded performance of common processors. Unfortunately processor speeds have mostly plateaued in recent years. However, the advent of freely programmable GPUs allowed their use as highly parallel systems for a variety of computational use cases, making them an attractive device for reaching performance goals. In this paper, we introduce STP, a quadratic placement implementation, which leverages the computational power of GPUs as well as multicore CPUs in order to speed up execution.
暂无评论