highly available metadata services of distributed file systems are essential to cloud applications. However, existing highly available metadata designs lack client-oriented features that treat metadata discriminately,...
详细信息
ISBN:
(纸本)9781467380119
highly available metadata services of distributed file systems are essential to cloud applications. However, existing highly available metadata designs lack client-oriented features that treat metadata discriminately, leading to a single metadata fault domain and low availability. After investigating the workload characteristics of Hadoop, we propose Client-Oriented METadata (COMET), a novel highly available metadata service design that divides and distributes metadata into independent regions in terms of clients. These regions are isolated fault domains inherently, and failures in one region will not break file operations in other regions. A prototype of COMET was implemented based on HDFS, and the experimental results show that COMET can significantly improve metadata availability of HDFS without obvious performance degradation. It can also deliver scalable performance and faster metadata recovery due to its decentralized architecture.
Applications with large amounts of data, real-time constraints, ultra-low power requirements, and heavy computational complexity present significant challenges for modern computing systems, and often fall within the c...
ISBN:
(纸本)9781479989300
Applications with large amounts of data, real-time constraints, ultra-low power requirements, and heavy computational complexity present significant challenges for modern computing systems, and often fall within the category of highperformancecomputing (HPC). As such, computer architects have looked to highperformance single instruction multiple data (SIMD) architectures, such as accelerator-rich platforms, for handling these workloads. However, since the results of these applications do not always require exact precision, approximate computing may also be leveraged. In this work, we introduce BRAINIAC, a heterogeneous platform that combines precise accelerators with neural-network-based approximate accelerators. These reconfigurable accelerators are leveraged in a multi-stage flow that begins with simple approximations and resorts to more complex ones as needed. We employ high-level, application-specific light-weight checks (LWCs) to throttle this multi-stage acceleration flow and reliably ensure user-specified accuracy at runtime. Evaluation of the performance and energy of our heterogeneous platform for error tolerance thresholds of 5%-25% demonstrates an average of 3x gain over computation that only includes precise acceleration, and 15x-35x gain over software-based computation.
In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group ...
详细信息
ISBN:
(纸本)9781509012336
In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group are broadcast over dynamically built spanning trees rooted at the message's source. For a given topic, delivery of published messages respects causal order. performance results of experiments conducted on the PeerSim simulator confirm the efficiency of VCube-PS in terms of scalability, latency, number, and size of messages when compared to a single rooted, not dynamically, tree built approach.
In this paper, we present WCSim, Workflow Cloud Simulator. Firstly, we argue that this cloud simulation tool offers a high level of accessibility by allowing the description of various components, such as users, infra...
详细信息
ISBN:
(纸本)9798350305487
In this paper, we present WCSim, Workflow Cloud Simulator. Firstly, we argue that this cloud simulation tool offers a high level of accessibility by allowing the description of various components, such as users, infrastructures, and workload, of a given scenario simply by providing parameters at launch time, without requiring the extension of the simulator code. Then, we explain how we conceived the components for the simulation models and provide a detailed description of the implemented software. Additionally, we compare the results of a small scenario obtained from two other simulation tools with those provided by WCSim. Finally, we present a case study that illustrates the usage of WCSim. The paper also introduces the a abstraction to model workflows as a Direct Acyclic Graph of Bag of Tasks.
An interconnection technology is described that utilizes excimer laser drilled vias and computer-controlled plating to provide vertical (Z-axis) electrical connections in highperformance flexible circuits. Specifical...
详细信息
An interconnection technology is described that utilizes excimer laser drilled vias and computer-controlled plating to provide vertical (Z-axis) electrical connections in highperformance flexible circuits. Specifically, solid vias and hemispherical microcontacts are created with a 1-mu-m nearest-neighbor height precision for the microcontacts. A novel structural architecture is employed which simplifies the ground plane connections for impedance controlled flex circuits. The technology is particularly suitable in the dc to 2-GHz frequency range, where large numbers of parallel connections and or multiple make and break connections are desirable. This technology was implemented with a polyimide substrate and nickel contacts, although the technology is applicable to other substrate and contact metallurgies.
The challenges to push computing to exaflop levels are difficult given desired targets for memory capacity, memory bandwidth, power efficiency, reliability, and cost. This paper presents a vision for an architecture t...
详细信息
ISBN:
(纸本)9781509049851
The challenges to push computing to exaflop levels are difficult given desired targets for memory capacity, memory bandwidth, power efficiency, reliability, and cost. This paper presents a vision for an architecture that can be used to construct exascale systems. We describe a conceptual Exascale Node architecture (ENA), which is the computational building block for an exascale supercomputer. The ENA consists of an Exascale Heterogeneous Processor (EHP) coupled with an advanced memory system. The EHP provides a high-performance accelerated processing unit (CPU+GPU), in-package high-bandwidth 3D memory, and aggressive use of die-stacking and chiplet technologies to meet the requirements for exascale computing in a balanced manner. We present initial experimental analysis to demonstrate the promise of our approach, and we discuss remaining open research challenges for the community.
The development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. This emerging scenario of heterogeneous mobile archite...
详细信息
ISBN:
(纸本)9781509012336
The development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. This emerging scenario of heterogeneous mobile architectures brings challenging issues regarding the use of the available computing resources. Such issues are mainly related to the intrinsic complexity of coordinating these processors in order to increase application performance. In this sense, this paper presents a high-level programming model to implement parallel patterns that can be executed in a coordinate way by heterogeneous mobile architectures. A comparative analysis of performance and programming complexity is presented, contrasting code generated automatically from the proposed programming model with low-level manually-optimized implementations.
For a long time the Instruction Set architecture (ISA) has been the firm contract between software and hardware. This firm contract plays an important role by decoupling the development of software from hardware micro...
详细信息
ISBN:
(纸本)9781509012336
For a long time the Instruction Set architecture (ISA) has been the firm contract between software and hardware. This firm contract plays an important role by decoupling the development of software from hardware micro-architectural features, enabling both to evolve independently. Nonetheless, it also condemns the ISA to become larger, more cluttered and inefficient as new instructions are incorporated over the years and deprecated instructions are left untouched to keep legacy compatibility. In this work we propose OpenISA, a flexible ISA that enables both the software and the hardware to evolve independently and discuss how OpenISA 1.0 was designed to enable efficient OpenISA software emulation on alien ISAs, which is key to free the user from hardware lock-ins. Our results show that software compiled to OpenISA can be latter emulated on x86 and ARM processors with very little overhead achieving near native performance, under 10% for the majority of programs.
We address the efficient design and implementation of dense matrix factorizations and inversion (DMFI) on modern multicore processors with several NUMA (non-uniform memory access) nodes. Our approach enhances the DMFI...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
We address the efficient design and implementation of dense matrix factorizations and inversion (DMFI) on modern multicore processors with several NUMA (non-uniform memory access) nodes. Our approach enhances the DMFI routines with a look-ahead strategy, in order to overcome the "panel factorization bottleneck". In addition, it exploits both hybrid task- and loop-level parallelizations while taking into account the NUMA organization of the memory hierarchy. The experiments on a Huawei Kunpeng-based server, with two sockets and 48 cores per socket, for three representative dense linear algebra operations, expose the necessity of adapting both the codes and their execution environment parameters to improve data access locality. The results of these changes deliver performance across inter- and intra-socket NUMA configurations superior to that of reference implementations from state-of-the-art libraries for this platform.
Convolutional Neural Networks (CNN) are very computation-intensive. Recently, a lot of CNN accelerators based on the CNN intrinsic parallelism are proposed. However, we observed that there is a big mismatch between th...
详细信息
ISBN:
(纸本)9781509049851
Convolutional Neural Networks (CNN) are very computation-intensive. Recently, a lot of CNN accelerators based on the CNN intrinsic parallelism are proposed. However, we observed that there is a big mismatch between the parallel types supported by computing engine and the dominant parallel types of CNN workloads. This mismatch seriously degrades resource utilization of existing accelerators. In this paper, we propose a flexible dataflow architecture (FlexFlow) that can leverage the complementary effects among feature map, neuron, and synapse parallelism to mitigate the mismatch. We evaluated our design with six typical practical workloads, it acquires 2-10x performance speedup and 2.5-10x power efficiency improvement compared with three state-of-the-art accelerator architectures. Meanwhile, FlexFlow is highly scalable with growing computing engine scale.
暂无评论