Image registration is a classical problem that addresses the problem of finding a geometric transformation that best aligns two images. Since the amount of multisensor remote sensing imagery are growing tremendously, ...
详细信息
ISBN:
(纸本)9783540680673
Image registration is a classical problem that addresses the problem of finding a geometric transformation that best aligns two images. Since the amount of multisensor remote sensing imagery are growing tremendously, the search for matching transformation with mutual information is very time-consuming and tedious, and fast and automatic registration of images from different sensors has become critical in the remote sensing framework. So the implementation of automatic mutual information based image registration methods on high performance machines needs to be investigated. First, this paper presents a parallel implementation of a mutual information based image registration algorithm. It takes advantage of cluster machines by partitioning of data depending on the algorithm's peculiarity. Then, the evaluation of the parallel registration method has been presented in theory and in experiments and shows that the parallel algorithm has good parallel performance and scalability.
Cloud computing is beginning to play a dominant role in scientific computing. However, there are still several challenges that need to be addressed, before data-intensive scientific applications make the transition to...
详细信息
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel ...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
The likelihood of unanticipated node failures in large-scale parallel computers increases with growing numbers of nodes. Furthermore, global reduction operations become major bottlenecks due to their limited parallel scalability. The Preconditioned Conjugate Gradient (PCG) method faces these challenges.
We study a basic information ranking problem in networks where each node holds an individual preference over a set of items and the goal for each node is to identify a sorted list of items with the largest aggregate p...
详细信息
ISBN:
(纸本)9781467325790
We study a basic information ranking problem in networks where each node holds an individual preference over a set of items and the goal for each node is to identify a sorted list of items with the largest aggregate preference. We would like to achieve this with a fully decentralized algorithm that uses a limited per-node memory and limited pair-wise communications. We show how this problem can be reduced to a plurality selection problem where the goal for each node is to identify an item with the largest aggregate ranking score, and show that solving the reduced problem solves the original ranking problem with high probability. Then we introduce a simple and natural plurality selection algorithm for the selection over m > 1 items that uses only log(2) (m) + 1 bits of per-node memory and per pair-wise communication. We prove correctness of the algorithm with high probability as the number of nodes grows large for the case when each node communicates with any other node, and establish tight convergence time bounds. The information ranking problem studied in this paper is a basic ranking problem that arises in various applications such as sorting elements in distributed computing systems, parallel databases, and may as well serve as a model of decentralized inference and opinion formation in distributed environments.
This paper presents a novel reconfrgurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPG...
详细信息
ISBN:
(纸本)0769526616
This paper presents a novel reconfrgurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPGAs organized in a scalable direct network that is substantially more interconnect-efficient than currently used crossbar technology. In addition, we discuss several ancillary issues and propose solutions required to support this architecture and achieve maximal performance for general-purpose applications;these include supporting IP, mapping techniques, and routing policies that enable greater flexibility for architectural evolution and code portability.
Loop tiling is an important compiler transformation used for enhancing data locality and exploiting coarsegrained parallelism. Tiled codes in which tile sizes are runtime parameters - called parametrically-tiled codes...
详细信息
In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. Current practice buffers the output data in DRAM for analytics proc...
详细信息
ISBN:
(纸本)9781538621295
In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. Current practice buffers the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks (SSDs) to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions-of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.
Standard Doppler ultrasound investigations are limited to detect the axial blood velocity component, as they cannot directly estimate the flow direction. A typical approach for obtaining velocity vectors consists in c...
详细信息
ISBN:
(纸本)9781467398978
Standard Doppler ultrasound investigations are limited to detect the axial blood velocity component, as they cannot directly estimate the flow direction. A typical approach for obtaining velocity vectors consists in combining the Doppler shifts detected by receiving the echoes from two (or more) different directions. Together with plane wave transmission, this strategy can assess the velocity data over a 2D region. Real time performance is achievable, provided that the electronics is capable of beamforming and processing the data acquired from several probe apertures in between consecutive transmissions. Recently, the ULA-OP 256 research scanner was equipped with a beamformer that, exploiting a parallel/serial process strategy, grants the calculation power for beamforming and processing multiple lines. In this work we present a vector Doppler method, based on the transmission of plane waves, which detects the velocity vectors on 8 parallel lines distributed over a 2D region 1 cm wide. The method is implemented on the ULA-OP 256 scanner and it achieves, in real-time, a refresh rate higher than 20 Hz when combined in duplex with a standard B-mode. Experiments on the carotid artery of a volunteer are reported, which show the effectiveness of the real-time implementation in detecting the complex flow patterns present in the carotid.
Divisible workload applications arise in many fields of science and engineering. They can be parallelized in master-worker fashion and relevant scheduling strategies have been proposed to reduce application makespan. ...
详细信息
ISBN:
(纸本)0769519652
Divisible workload applications arise in many fields of science and engineering. They can be parallelized in master-worker fashion and relevant scheduling strategies have been proposed to reduce application makespan. Our goal is to develop a practical divisible workload scheduling strategy. This requires that previous work be revisited as several usual assumptions about the computing platform do not hold in practice. We have partially addressed this concern in a previous paper via an algorithm that achieves high performance with realistic resource latency models. In this paper we extend our approach to account for performance prediction errors, which are expected for most real-world performance and applications. In essence, we combine ideas from multi-round divisible workload scheduling, for performance, and from factoring-based scheduling, for robustness. We present simulation results to quantify the benefits of our approach compared to our original algorithm and to other previously proposed algorithms.
In this paper we consider the operator mapping problem for in-network stream processingapplications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that...
详细信息
In this paper we consider the operator mapping problem for in-network stream processingapplications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that are continually updated at various locations on a network. Examples of in-network stream processing include the processing of data in a sensor network, or of continuous queries on distributed relational databases. We study the operator mapping problem in a "constructive" scenario, i.e., a scenario in which one builds a platform dedicated to the application by purchasing processing servers with various costs and capabilities. The objective is to minimize the cost of the platform while ensuring that the application achieves a minimum steady-state throughput. The first contribution of this paper is the formalization of a set of relevant operator-placement problems, and a proof that even simple versions of the problem are NP-complete. Our second contribution is the design of several polynomial time heuristics, which are evaluated via extensive simulations and compared to theoretical bounds for optimal solutions.
暂无评论