Concurrency and consistency are the two inherent and complex characteristics of distributedsystems. Their types, levels and implementation procedures determine the nature and efficiency of a distributed system. Concu...
详细信息
ISBN:
(纸本)9781665431682
Concurrency and consistency are the two inherent and complex characteristics of distributedsystems. Their types, levels and implementation procedures determine the nature and efficiency of a distributed system. Concurrency and consistency are difficult concepts to understand, moreover, without a comprehensive understanding a complete system cannot be designed and built. Applying a comprehensive understanding of concurrency and consistency to the design of a distributed system will generate a system that is more closely aligned with the desired outcomes. This paper analyses both concurrency and consistency in distributedsystems to present a comprehensive understanding of their requirements, types, levels, benefits and limitations. Initially, it analyses concurrency and compares it with parallelism to distinguish the two related but distinct terms. Subsequently, it analyses consistency and different consistency models including a comparative analysis of strong consistency and weak consistency models, and data-centric consistency and client-centric consistency models.
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array ...
详细信息
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.
Reconfigurable computing systems have already shown their abilities to accelerate embedded hardware/ softwaresystems. Since standard processor-based embedded applications have come to their limits we need new concept...
详细信息
The high-performance requirements needed to implement the most advanced functionalities of current and future Cyber-Physical systems (CPSs) are challenging the development processes of CPSs. On one side, CPSs rely on ...
详细信息
ISBN:
(纸本)9781728169583
The high-performance requirements needed to implement the most advanced functionalities of current and future Cyber-Physical systems (CPSs) are challenging the development processes of CPSs. On one side, CPSs rely on model-driven engineering (MDE) to satisfy the non-functional constraints and to ensure a smooth and safe integration of new features. On the other side, the use of complex parallel and heterogeneous embedded processor architectures becomes mandatory to cope with the performance requirements. In this regard, parallel programming models, such as OpenMP or CUDA, are a fundamental brick to fully exploit the performance capabilities of these architectures. However, parallel programming models are not compatible with current MDE approaches, creating a gap between the MDE used to develop CPSs and the parallel programming models supported by novel and future embedded platforms. The AMPERE project will bridge this gap by implementing a novel software architecture for the development of advanced CPSs. To do so, the proposed software architecture will be capable of capturing the definition of the components and communications described in the MDE framework, together with the non-functional properties, and transform it into key parallel constructs present in current parallel models, which may require extensions. These features will allow for making an efficient use of underlying parallel and heterogeneous architectures, while ensuring compliance with non-functional requirements, including those on real-time performance of the system.
The proceedings contain 8 papers. The topics discussed include: a debugger for flow graph based parallel applications;organizing processes and threads for debugging;techniques for specifying bug patterns;testing patte...
详细信息
ISBN:
(纸本)159593748X
The proceedings contain 8 papers. The topics discussed include: a debugger for flow graph based parallel applications;organizing processes and threads for debugging;techniques for specifying bug patterns;testing patterns for software transactional memory engines;semantics driven dynamic partial-order reduction of MPI-based parallel programs;and healing data races on-the-fly.
Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models,...
详细信息
ISBN:
(纸本)9781728174457
Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network topology is constant, but machine learning research often involves changing network topologies and hyperparameters. Traditional kernel auto-tuning has limited impact in this case;a more general selection of kernels is required for libraries to accelerate machine learning research. In this paper we present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries that target a range of heterogeneous devices from desktop GPUs to embedded accelerators. The techniques investigated apply more generally and could similarly be integrated with other heterogeneous programming systems. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
In this paper a new architecture description language called ArchC#, is introduced. ArchC# is an extension of ArchJava for C#. It is mainly focused on describing architecture of distributedsystems. ArchC# provides bu...
详细信息
ISBN:
(纸本)9783540756972
In this paper a new architecture description language called ArchC#, is introduced. ArchC# is an extension of ArchJava for C#. It is mainly focused on describing architecture of distributedsystems. ArchC# provides built-in constructs for describing distributed components and their interconnections. Specific features of distributed code such as remote asynchronous calls and activation of remote objects can be described in ArchC#. ArchC# unifies software architecture with an object-oriented implementation.
With the DDR standard facing density challenges and the emergence of the non-volatile memory technologies such as Cross-Point, phase change, and fast FLASH media, compute and memory vendors are contending with a parad...
详细信息
ISBN:
(纸本)9781665440660
With the DDR standard facing density challenges and the emergence of the non-volatile memory technologies such as Cross-Point, phase change, and fast FLASH media, compute and memory vendors are contending with a paradigm shift in the datacenter space. The decades-long status quo of designing servers with DRAM technology as an exclusive memory solution is likely coming to an end. Future systems will increasingly employ tiered memory architectures (TMAs) in which multiple memor) , technologies work together to satisfy applications' ever-growing demands for more memory, less latency, and greater bandwidth. Exactly how to expose each memory type to software is an open question. Recent systems have focused on hardware caching to leverage faster DRAM memory while exposing slower non-volatile memory to OS-addressable space. The hardware approach that deals with the non-uniformity of TMA, however, requires complex changes to the processor and cannot use fast memory to increase the system's overall memory capacity. Mapping an entire TMA as OS-visible memory alleviates the challenges of the hardware approach but pushes the burden of managing data placement in the TMA to the software layers. The software, however, does not see the memory accesses by default;in order to make informed memory-scheduling decisions, software must rely on hardware methods to gain visibility into the load/store address stream. The OS then uses this information to place data in the most suitable memory location. In this paper, we evaluate different methods of memory-access collection and propose a hybrid tiered-memory approach that offers comprehensive visibility into TMA.
The concept of software architecture, also said system structure or system configuration, is especially important to design complex softwaresystems, providing a model of the large scale structural properties of syste...
详细信息
The concept of software architecture, also said system structure or system configuration, is especially important to design complex softwaresystems, providing a model of the large scale structural properties of systems. Module interconnection languages (MILs) introduced the idea of creating program modules and connecting them to form larger structures. However, MILs do not support the description of important architectural elements. A new class of description languages, referred to as architectural description languages (ADLs), have recently emerged. Most ADLs, however, support only the description of static software architectures and not dynamic or reconfigurable software architectures. A further limitation of current ADLs is that they focus mainly on the formal notation and usually do not offer proof systems and tools to enable designers to formally verify the properties of their designs. We have developed the ZCL framework, which is a formal framework, specified in Z, to describe and reason about dynamic distributedsoftware architectures. In this paper, we use a simple case study - the client-server system - to demonstrate how our formal framework ZCL can be used to specify and verify reconfigurable software architectures.
While multi-core platforms are now ubiquitous in all areas of information technology, from enterprise softwareengineering to mobile app development, parallel computing education is still lagging behind the demand for...
详细信息
ISBN:
(纸本)9780769561493
While multi-core platforms are now ubiquitous in all areas of information technology, from enterprise softwareengineering to mobile app development, parallel computing education is still lagging behind the demand for skilled parallel programmers. At many universities today, parallel and concurrent computing is still not part of the core curriculum because of resistance to major curriculum changes. Many other universities lack the necessary educators or infrastructure to teach a comprehensive parallel computing course. Furthermore, even addressing these issues would do nothing towards supporting software professionals who have already entered the work force and have no plans to return to school. To address this broad need for a standalone, publically available, comprehensive, and easily accessible course on parallel computing, we have developed an online offering packaged as a Coursera Specialization on parallel, Concurrent, and Distributing Computing in Java. In this paper, we describe the preparations for this online course and the unique challenges we encountered in terms of both curriculum development and technical infrastructure. We describe how lessons learned from an on-campus parallelism course at Rice University helped to shape the Coursera specialization, and summarize our experience with implementing this specialization on the Coursera platform at scale.
暂无评论