processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement betwe...
详细信息
ISBN:
(纸本)9781509066117
processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and parallel thread execution on multiple PIM devices. In this study, we present AnalyzeThat, a programmable shared-memory system for parallel data processing with PIM devices. Thematic to AnalyzeThat is a rich PIMA-ware Data Structure (PADS), which is an encapsulation that integrally ties together the data, the analysis tasks and the runtime needed to interface with the PIM device array. The PADS abstraction provides (i) a key-value data container that allows programmers to easily store data on multiple PIMs, (ii) a suite of parallel operations with which users can easily implement data analysis applications, and (iii) a runtime, hidden to programmers, which provides the mechanisms needed to overlay both the data and the tasks on the PIM device array in an intelligent fashion, based on PIM-specific information collected from the hardware. We have developed a PIM emulation framework called AnalyzeThat. Our experimental evaluation with representative data analytics applications suggests that the proposed system can significantly reduce the PIM programming effort without losing its technology benefits.
A class of simple network topologies is proposed and analyzed in this paper as part of the research on a polymorphous array architecture for graphics and image processing. The topologies are extended from the mesh net...
详细信息
ISBN:
(纸本)9780769548791
A class of simple network topologies is proposed and analyzed in this paper as part of the research on a polymorphous array architecture for graphics and image processing. The topologies are extended from the mesh network topology and are amenable to VLSI implementation. Simulation and theoretical analyses show that these topologies have many advantages over the mesh and Xmesh topologies. Routing algorithms are also proposed and analyzed for these new network topologies.
Baseline wander (BLW) is caused by transmission zeros at DC in the frequency response of the communication channel [1]. One of its common sources is a high-pass filter used to AC-couple the received signal. Numerous B...
详细信息
ISBN:
(纸本)9781424494743
Baseline wander (BLW) is caused by transmission zeros at DC in the frequency response of the communication channel [1]. One of its common sources is a high-pass filter used to AC-couple the received signal. Numerous BLW compensation (BLWC) schemes for baseband and homodyne communication systems have been proposed in past literature. However, a preferred architecture in modern coherent optical communications is the intradyne receiver, because it avoids complex optical phase locked loops. As a result of the carrier frequency offset experienced at the input of intradyne coherent receivers, existing BLWC techniques are not sufficient. In this work we propose an adaptive parallel-processing BLW compensation architecture for high-speed coherent intradyne fiber optic receivers. parallelprocessing for the BLWC implementation is imposed by the high data rate required in current optical applications. The new compensator estimates the BLW interference based on both the detected symbols and the carrier frequency offset provided by the carrier recovery loop. Numerical results demonstrate the effectiveness of the proposed technique to mitigate BLW effects.
The aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. The great challenge is to maintain the scalability and efficiency of massivel...
详细信息
ISBN:
(纸本)9781509051465
The aim of this paper is to present a new distributed computing middleware for High Performance Computing (HPC) based cloud micro-services. The great challenge is to maintain the scalability and efficiency of massively parallel and distributed computational system when the intensive big data processed by its applications is widely increased. Besides, the proposed middleware implements a new cooperative micro-services team works model for massively parallel and distributed computing. This model is constituted by distributed micro-services as Micro-service Virtual processing Units (MsVPUs) with integrated load balancing service and an AMQP communication protocol that grant HPC. The paper shows the proposed distributed computational scheme and its integrated middleware accompanying by some experimental results.
In a distributed system, executing a program often requires the access of remote data files. An efficient data transmission strategy is thus important for real-time applications. Since data files may be replicated and...
详细信息
In a distributed system, executing a program often requires the access of remote data files. An efficient data transmission strategy is thus important for real-time applications. Since data files may be replicated and their locations are transparent to the executed program, it becomes system's responsibility to select a proper file server such that data can be transmitted in an effective way. In this paper, we consider the scenario that a particular program needs several data files simultaneously for its execution. Each data file may has replicas across the network. The problem is how to find a collection of file servers and the routing paths such that the data transmission time is minimum. An exhaustive approach could of course find the optimal solution. However, it pays for high computation price. We thus proposes a heuristic method. Results obtained from the proposed method are encouraging.
Computational Grids are emerging as a new infrastructure for Internet-based parallel and distributed computing. They enable the sharing, exchange, discovery, and aggregation of resources distributed across multiple ad...
详细信息
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes...
详细信息
We present a novel partial reduction algorithm to aggregate sparsely distributed intermediate results that are generated by data-parallel analysis and visualization algorithms. applications of partial reduction includ...
详细信息
ISBN:
(纸本)9781538668733
We present a novel partial reduction algorithm to aggregate sparsely distributed intermediate results that are generated by data-parallel analysis and visualization algorithms. applications of partial reduction include flow trajectory analysis, big data online analytical processing, and volume rendering. Unlike traditional full parallel reduction that exchanges dense data across all processes, the purpose of partial reduction is to exchange only intermediate results that correspond to the same query, such as line segments of the same flow trajectory. To this end, we design a three-stage algorithm that minimizes the communication cost: (1) partitioning the result space into groups;(2) constructing and optimizing the reduction partners for each group;and (3) initiating collective reduction operations for all groups concurrently. Both theoretical and empirical analyses show that our algorithm outperforms the traditional methods when the intermediate results are sparsely distributed. We also demonstrate the effectiveness of our algorithm for flow visualization, big log data analysis, and volume rendering.
The aim of this paper is to propose a numerical controller design methodology. This methodology is based on two steps. In the first step, the tensor product (TP) model transformation is applied, which is capable of tr...
详细信息
The aim of this paper is to propose a numerical controller design methodology. This methodology is based on two steps. In the first step, the tensor product (TP) model transformation is applied, which is capable of transforming a given nonlinear state-space dynamic model into TP model form. Then, in the second step, the linear matrix inequality (LMI) theorems are used within the paralleldistributed compensation (PDC) controller design frameworks. The main novelty of this paper is the TP model transformation of the first step. It is also capable of dealing with the tradeoff between complexity and accuracy of the resulting TP model. The TP model transformation is a numerical method that leads to the following advantages: it is capable of functioning with models given either by analytic explicit forms or by various soft-computing based identification techniques;it does not need problem dependent analytic derivations, but can be executed "automatically" by computers. Numerical simulations are used to provide empirical validation of the proposed control design methodology. In order to demonstrate the effectiveness of the TP model transformation a controller is derived for the prototypical aeroelastic wing section that exhibits limit cycle oscillation and chaotic behavior.
暂无评论