Code migration from serial to parallel platforms using a distributed-memory model involves a number of changes in the serial code. The most significant change is the implementation of a message passing interface (MPI)...
详细信息
ISBN:
(纸本)1932415599
Code migration from serial to parallel platforms using a distributed-memory model involves a number of changes in the serial code. The most significant change is the implementation of a message passing interface (MPI) scheme for domain partitioning, data coherence among processors, and data communication for spatially distinct multicomponent applications. The DBuilder software has been developed as a toolkit for application developers to leverage code migration efforts. This paper details the DBuilder software design, the coordination of objects when partitioning the domain with two existing objects, e.g., vertex and element, the synchronization algorithms, and the coupler setup for multicomponent applications with partially overlapped distinct dimensional domains. Furthermore, the integration of legacy parallel linear solver software is also discussed.
As a promising architectural paradigm for applications which demand high I/O bandwidth, processing-in-Memory (PIM) computing techniques have been adopted in designing Convolutional Neural Networks (CNNs). However, due...
详细信息
ISBN:
(纸本)9781538637906
As a promising architectural paradigm for applications which demand high I/O bandwidth, processing-in-Memory (PIM) computing techniques have been adopted in designing Convolutional Neural Networks (CNNs). However, due to the notorious memory wall problem, PIM based on existing device memory still cannot deal with complex CNN applications under the constraints of memory bandwidth and processing latency. To mitigate this problem, this paper proposes an efficient PIM architecture based on skyrmion and domain-wall racetrack memories, which can further exploit the potential of PIM architectures in terms of processing latency and energy efficiency. By adopting full adders and multipliers developed using skyrmion and domain-wall nanowires, our proposed PIM architecture can accommodate complex CNNs at different scales. Experimental results show that comparing with both traditional and state-of-the-art PIM architectures, our proposed PIM architecture can improve the processing latency and energy efficiency of CNNs drastically.
Avionics applications need to be certified for the highest criticality standard. This certification includes schedulability analysis and worst-case execution time (WCET) analysis. WCET analysis is only possible when t...
详细信息
ISBN:
(纸本)9781467387767
Avionics applications need to be certified for the highest criticality standard. This certification includes schedulability analysis and worst-case execution time (WCET) analysis. WCET analysis is only possible when the software is written to be WCET analyzable and when the platform is time-predictable. In this paper we present prototype avionics applications that have been ported to the time-predictable T-CREST platform. The applications are WCET analyzable, and T-CREST is supported by the aiT WCET analyzer. This combination allows us to provide WCET bounds of avionic tasks, even when executing on a multicore processor.
This paper presents a synchronization methodology for the thread-pool model. In our approach, when an execution needs to be blocked, synchronization code releases and returns the executing thread to the thread pool, r...
详细信息
ISBN:
(纸本)1892512459
This paper presents a synchronization methodology for the thread-pool model. In our approach, when an execution needs to be blocked, synchronization code releases and returns the executing thread to the thread pool, rather than blocking the thread, so that the thread can execute another job. This is done by returning from one function and calling another function. As a result, switching from one job to another is done extremely fast. When applied to Jacobi Iteration in grid simulations, our approach shows dramatic performance improvement over the traditional thread-per-request model using OS primitives for synchronization. This approach is particularly attractive for heavily-loaded systems where the number of requests far-exceeds the number of processors, such as web servers.
This paper presents a performance study of a nonrigid registration algorithm for investigating lung disease on clusters. Our algorithm combines two conventional acceleration techniques in order to achieve fast registr...
详细信息
ISBN:
(纸本)0769524052
This paper presents a performance study of a nonrigid registration algorithm for investigating lung disease on clusters. Our algorithm combines two conventional acceleration techniques in order to achieve fast registration: a data-parallelprocessing technique for accelerating the registration procedure;and a precomputation technique for reducing the computational complexity. We perform some experiments on three clusters with different CPU and network performance in order to make clear what kinds of acceleration techniques and computing environments provide higher performance. The results show that a cluster with Gigabit Ethernet (GbE) network is the most cost effective solution that reduces registration time from ten hours to ten minutes with a linear speedup.
Underlay-unawareness in P2P systems can result in sub-optimal peer selection for overlay routing and hence poor performance. The majority of underlay aware proposals for peer selection focus on finding the shortest ov...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
Underlay-unawareness in P2P systems can result in sub-optimal peer selection for overlay routing and hence poor performance. The majority of underlay aware proposals for peer selection focus on finding the shortest overlay routes by selecting the nearest peers according to proximity. However, in case of multiple and parallel downloads, if the underlay paths between a downloader and its selected nearest peers share a bottleneck, this can cause congestion, leading to performance deterioration instead of improvement. This effect was neglected in previous work because, in today's Internet, the bottleneck is usually not shared as it is the end user's access link. This is no longer the case in more modern scenarios, e. g. with FTTH or with upcoming in-network caching techniques such as DECADE. We propose an improved peer selection approach for P2P applications called Fewest Common Hops (FCH) that ensures proximity based node selection having maximum path disjointness. It is a client based, infrastructure independent heuristic to optimize download time for multiple and parallel downloads in P2P content distribution applications. Simulations show that, even when FCH is implemented in the simplest possible fashion (using only traceroute), it can significantly decrease the download time.
The development of high speed local networks and cheap, but also powerful PCs, lead to an extensive use of PC-Clusters as building blocks of If modern grid computing systems. In order to exploit the available resource...
详细信息
ISBN:
(纸本)1932415262
The development of high speed local networks and cheap, but also powerful PCs, lead to an extensive use of PC-Clusters as building blocks of If modern grid computing systems. In order to exploit the available resources at the best, any program or a packet of interdependent jobs that are submitted to the grid has to be split into parallel executable tasks, which have to be scheduled to the available processing elements. The need for data communication between these tasks leads to dependencies, which strongly effect the schedule. In this paper we consider task graphs that take computation and communication costs into account. For a completely meshed homogeneous computing system with a fixed number of processing elements, we compute schedules with minimum schedule length. Our contribution consists of parallelizing an informed search algorithm for calculating optimal schedules based on the IDA*-algorithm, a memory-saving derivative of the well known A*-algorithm. Due to the resulting memory requirements, the application of the A*-algorithm is restricted to task graph scheduling problems with a quite small number of tasks. In contrast, the IDA*-algorithm can compute optimal schedules for up to 20 tasks (jobs) in real time. Thus, it can be used as an online sub-scheduler within grid computing systems.
In this paper, we examine the suitability of CORBA-based solutions for meeting application requirements in the field of distributedparallel programming. We outline concepts defined within CORBA which are helpful for ...
详细信息
We propose an algorithm that performs data distribution and parallelization simultaneously. The objectives of the simultaneous algorithm are to reduce the length of critical path and the total memory size. Regardless ...
详细信息
ISBN:
(纸本)1932415262
We propose an algorithm that performs data distribution and parallelization simultaneously. The objectives of the simultaneous algorithm are to reduce the length of critical path and the total memory size. Regardless to say, memory usage for each processor must be balanced. To obtain an optimal solution, we first adopted a branch and bound method. Since the branch and bound method often fails in the case of a large task graph, we adopt a multi-objective genetic algorithm, that provides a near optimal solution. For effective simultaneous partitionings, we employ some edge sorting and ordering methods. The effectiveness of our simultaneous partitioning algorithms is shown by experimental results.
Continuously monitoring the event history of a distributed application for the occurrence of interesting event predicates is a difficult problem. For many predicates, once an event matches a portion of a predicate it ...
详细信息
ISBN:
(纸本)1932415262
Continuously monitoring the event history of a distributed application for the occurrence of interesting event predicates is a difficult problem. For many predicates, once an event matches a portion of a predicate it cannot be discarded since it may match future events in one or more solutions. In this paper we have addressed the problem of multi-agent state monitoring and developed weak conjunctive predicate detection protocol on the multi-agent platform JADE.
暂无评论