This paper introduces a new approach to build application layer multicast overlay - Multiple Shared Trees. Multiple shared trees' approach makes tradeoffs between traditional source-based trees and single-shared t...
详细信息
ISBN:
(纸本)9780769534343
This paper introduces a new approach to build application layer multicast overlay - Multiple Shared Trees. Multiple shared trees' approach makes tradeoffs between traditional source-based trees and single-shared tree, and between transmission efficiency and protocol overheads. Based on this, we propose two protocols to build ALM overlay among end users and media-forwarding-gateways respectively. The latter references the design thought of Aggregated Multicast to share the multicast trees among groups.
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and ...
详细信息
ISBN:
(纸本)9781538643686
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and hence can be considered as a key building block of sparse numerical linear algebra. This is why, since the early days, their parallel solution has been exhaustively studied, and efficient implementations of this kernel can be found for almost every hardware platform. In the GPU context, the most widespread implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to aggregate the unknowns of the triangular system into level sets. This determines an execution schedule for the solution of the system, where the level sets have to be processed sequentially while the unknowns that belong to one level set can be solved in parallel. One of the disadvantages of the CUSPARSE implementation is that this preprocessing stage is often extremely slow in comparison to the runtime of the solving phase. In this work, we present a parallel GPU algorithm that is able to compute the same level sets as CUSPARSE but takes significantly less runtime. Our experiments on a set of matrices from the SuiteSparse collection show acceleration factors of up to 44x. Additionally, we provide a routine capable of solving a triangular linear system on the same pass used to calculate the level sets, yielding important performance benefits.
Just like every ecosystem, the computing one is subject to permanent evolution. In this paper we identify three major challenges resulting from this evolution. Those challenges stem from the hardware and application l...
详细信息
ISBN:
(数字)9781728174457
ISBN:
(纸本)9781728174457
Just like every ecosystem, the computing one is subject to permanent evolution. In this paper we identify three major challenges resulting from this evolution. Those challenges stem from the hardware and application layer likewise. For one, we entered the era of many-* hardware architectures, which poses new requirements to upper layers. And with the proliferation of the computing continuum (e.g., cloud-, fog- and edge-computing), applications become more demanding and dynamic: The system needs to be able to satisfy application-intrinsic requirements and counter application-extrinsic uncertainties. As part of our contribution we present the current and ongoing research topics of our system-software stack for future many-* architectures. We further present the various mechanisms and concepts we employ within our system-software and describe how the system-software collaborates with other layers to tackle those challenges. Those concepts include a fundamentally different execution model and control-flow abstraction, allowing for massive micro-parallelism to efficiently utilize the hardware. Since the system-software research is performed as part of a collaborative research centre, we are able to approach the challenges on all layers of the technology stack and verify our solutions on an FPGA-based prototype platform. This allows us to design mechanisms in collaboration with every layer of the technology stack, that, when put together, cooperate across layer boundaries.
Today's computing world features a growing number of cyber-physical systems that require the cooperation of many physical devices. Examples include autonomous cars and co-working robots, which are expected to appr...
详细信息
ISBN:
(纸本)9781450379625
Today's computing world features a growing number of cyber-physical systems that require the cooperation of many physical devices. Examples include autonomous cars and co-working robots, which are expected to appropriately adapt to any possible context they find themselves in (e.g. the presence of a nearby human). However, the controlling software continues to be developed using established object-oriented modelling techniques like UML, which do not natively possess a notion of context and thus may introduce accidental complexity. With increasing complexity, the probability of the introduction of software errors rises, which can have fatal consequences in cyber-physical systems. To address this, we envision a model-driven architecture for self-adaptive cyberphysical systems that explicitly models structured context. Entities are modelled as message-passing parallel processes and can play roles in specific contexts, which dynamically alter their behaviour and relationships with other parts of the system. Since the planning of complex adaptations can be cumbersome in real-world scenarios, we envision an intuitive formulation of adaptations as graph rewriting rules on the context model. This paper discusses the current state of research and identifies open research challenges. Based on this, the envisioned architecture as well as an evaluation strategy are presented.
Multi and many-core processors have emerged as the dominant solution for processing in the whole range of computer system, from small devices to large-scale installations. Chip multi-processors, which are homogeneous,...
详细信息
ISBN:
(纸本)9781479942930
Multi and many-core processors have emerged as the dominant solution for processing in the whole range of computer system, from small devices to large-scale installations. Chip multi-processors, which are homogeneous, multi and many-core processors, offer an unprecedented amount of on-chip, shared resources and brings a unique set of challenges. Given the importance of the Last-Level Cache management techniques to achieve near-perfect isolation, we survey the state of the art and propose research directions to address the most pressing issues in modern computer systems. To better understand the various research directions in the field, we propose a classification of the presented techniques. Finally, we discuss possible research directions.
The proceedings contain 31 papers. The topics discussed include: BarrierPoint: sampled simulation of multi-threaded applications;exploiting spatial architectures for edit distance algorithms;a top-down method for perf...
ISBN:
(纸本)9781479936052
The proceedings contain 31 papers. The topics discussed include: BarrierPoint: sampled simulation of multi-threaded applications;exploiting spatial architectures for edit distance algorithms;a top-down method for performance analysis and counters architecture;Moby: a mobile benchmark suite for architectural simulators;optimized hardware for suboptimal software: the case for SIMD-aware benchmarks;extending statistical cache models to support detailed pipeline simulators;manifold: a parallel simulation framework for multicore systems;prime: a parallel and distributed simulator for thousand-core chips;steps towards wider use of concurrency code patterns;power modeling and other new features in the graphite simulator;evaluating trace aggregation for performance visualization of large distributedsystems;reverse engineering of cache replacement policies in Intel microprocessors and their evaluation;and a software based profiling method for obtaining speedup stacks on commodity multi-cores.
The Tatami project is building a system to support softwareengineering over the internet, exploiting recent advances in web technology interface design, and specification. Our effort to improve the usability of such ...
详细信息
ISBN:
(纸本)0769509339
The Tatami project is building a system to support softwareengineering over the internet, exploiting recent advances in web technology interface design, and specification. Our effort to improve the usability of such systems led us into algebraic semiotics, while our effort to develop better formal methods for distributed concurrent systems led us into hidden algebra. We discuss the Tatami system design, especially user interface issues, and sketch an extension of algebraic semiotics for interface dynamics.
Technique advances have made image capture and storage very convenient, which results in an explosion of the amount of visual information. It becomes difficult to find useful information from these tremendous data. Co...
详细信息
ISBN:
(纸本)9781424437511
Technique advances have made image capture and storage very convenient, which results in an explosion of the amount of visual information. It becomes difficult to find useful information from these tremendous data. Content-based Visual Information Retrieval (CBVIR) is emerging as one of the best solutions to this problem. Unfortunately, CBVIR is a very compute-intensive task. Nowadays, with the boom of multi-core processors, CBVIR can be accelerated by exploiting multi-core processing capability. In this paper, we propose a parallelization implementation of a CBVIR system facing to server application and use some serial and parallel optimization techniques to improve its performance on an 8-core and on a 16-core systems. Experimental results show that optimized implementation can achieve very fast retrieval on the two multicore systems. We also compare the performance of the application on the two multi-core systems and give an explanation of the performance difference between the two systems. Furthermore, we conduct detailed scalability and memory performance analysis to identify possible bottlenecks in the application. Based on these experimental results and performance analysis, we gain many insights into developing efficient applications on future multicore architectures.
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program an...
详细信息
ISBN:
(纸本)0769523129
A novel bitstream generation algorithm and its software implementation are introduced. Although this tool was developed for the configuration of AMDREL FPGA reconfigurable platform [13], it could be used to program any other compatible device. This tool is the only one known academic implementation for FPGA configuration with such features. Among them are the run-time-, partial- and dynamic-reconfiguration, the memory management, the bitstream compression and encryption, the read-back technique, the bitstream reallocation, the used low-power techniques as well as the Graphical User Interface.
Researchers increasingly rely on using web-based systems for accessing and running scientific applications across distributed computing resources. However existing systems lack a number of important features, such as ...
详细信息
ISBN:
(纸本)9781467371483
Researchers increasingly rely on using web-based systems for accessing and running scientific applications across distributed computing resources. However existing systems lack a number of important features, such as publication and sharing of scientific applications as online services, decoupling of applications from computing resources and providing remote programmatic access. This paper presents Everest, a web-based platform for researchers supporting publication, execution and composition of applications running across distributed computing resources. Everest addresses the described challenges by relying on modern web technologies and cloud computing models. It follows the Platform as a Service (PaaS) cloud delivery model by providing all its functionality via remote web and programming interfaces. Any application added to Everest is automatically published both as a user-facing web form and a web service. Another distinct feature of Everest is the ability to attach external computing resources by any user and flexibly use these resources for running applications. The paper provides an overview of the platform's architecture and its main components, describes recent developments, presents results of experimental evaluation of the platform and discusses remaining challenges.
暂无评论