We present a parallel treecode for fast kernel summation in high dimensions-a common problem in data analysis and computational statistics. Fast kernel summations can be viewed as approximation schemes for dense kerne...
详细信息
ISBN:
(纸本)9781479986484
We present a parallel treecode for fast kernel summation in high dimensions-a common problem in data analysis and computational statistics. Fast kernel summations can be viewed as approximation schemes for dense kernel matrices. Treecode algorithms (or simply treecodes) construct low-rank approximations of certain off-diagonal blocks of the kernel matrix. These blocks are identified with the help of spatial data structures, typically trees. There is extensive work on treecodes and their parallelization for kernel summations in three dimensions, but there is little work on high-dimensional problems. Recently, we introduced a novel treecode, ASKIT, which resolves most of the shortcomings of existing methods. We introduce novel parallel algorithms for ASKIT, derive complexity estimates, and demonstrate scalability on synthetic, scientific, and image datasets. In particular, we introduce a local essential tree construction that extends to arbitrary dimensions in a scalable manner. We introduce data transformations for memory locality and use GPU acceleration. We report results on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center. Our largest computations involve two billion points in 64 dimensions on 32,768 x86 cores and 8 million points in 784 dimensions on 16,384 x86 cores.
The main contribution of this paper is to show a new GPU implementation for the digital halftoning by the local exhaustive search that can generate high quality binary images. We have considered programming issues of ...
详细信息
ISBN:
(纸本)9781467371483
The main contribution of this paper is to show a new GPU implementation for the digital halftoning by the local exhaustive search that can generate high quality binary images. We have considered programming issues of the GPU architecture to implement these two methods on the GPU. The experimental result shows that our GPU implementation for the local exhaustive search on NVIDIA GeForce GTX 980 for a 512x512 gray scale image runs in 732 seconds, while the CPU implementation runs in 37,364 seconds. Thus, our GPU implementation attains a speed-up factor of 50.98. Additionally, we also propose a GPU implementation for the digital halftoning by the partial exhaustive search of which the search space of the local exhaustive search is reduced. Similarly, we can accelerate the computation of the partial exhaustive search 30.73 times faster.
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of th...
详细信息
ISBN:
(纸本)9781467390064
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and the Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99χ and 2062x speedups on the two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5χ respectively with 20 nodes on real datasets. We demonstrate weak scaling on a synthetic dataset with 1024 subjects, equivalent in size to the biggest fMRI dataset collected until now, on up to 1024 nodes and 32,768 cores.
Modern digital world produces massive amount of data generally refereed as Big Data, which play important roles in dictating the quality of our lives. Relationships among such data have high value, but extremely compl...
详细信息
ISBN:
(纸本)9781467373494
Modern digital world produces massive amount of data generally refereed as Big Data, which play important roles in dictating the quality of our lives. Relationships among such data have high value, but extremely complex task to establish. Medical Field is one of the major big data sources which produces big volume of data. Modern surgical tools have the capability to record High Definition(HD) videos during the surgical procedure which enables post surgical reviews. Such tools produces giga bytes(GB) of video footage after every surgery which needs mass storage and complex processing. A major solution for this problem is paralleldistributedprocessing using Hadoop based Map Reduce Framework. This paper proposes a Surgical Video Analysis Framework using Hadoop to analyze large surgical videos, for identifying surgical instruments used. Framework first converts videos into large number of frames and using Hadoop image Process Interface (HIPI) it is converted to HIB image bundles. parallelprocessing of images in the bundle is done by mappers and identified instrument frame information's are logged. Three different feature extraction methods: Scale-Invariant Feature Transform(SIFT), Speeded Up Robust Features(SURF) with Support Vector Machines(SVM) and Haralick Texture Descriptor with Support Vector Machines(SVM) is used in mappers for local imageprocessing.
Aiming at TB-scale time-varying scientific datasets, this paper presents a novel static load balancing scheme based on information entropy to enhance the efficiency of the parallel adaptive volume rendering algorithm....
详细信息
ISBN:
(纸本)9789898533388
Aiming at TB-scale time-varying scientific datasets, this paper presents a novel static load balancing scheme based on information entropy to enhance the efficiency of the parallel adaptive volume rendering algorithm. An information-theory model is proposed firstly, and then the information entropy is calculated for each data patch, which is taken as a pre-estimation of the computational amount of ray sampling. According to their computational amounts, the data patches are distributed to the processing cores balancedly, and accordingly load imbalance in parallel rendering is decreased. Compared with the existing methods such as random assignment and ray estimation, the proposed entropy-based load balancing scheme can achieve a rendering speedup ratio of 1.23 similar to 2.84. It is the best choice in interactive volume rendering due to its speedup performance and view independence.
As an important function of a distributed decision support system, model composition aims to aggregate model functions to solve complex decision problems. Most existing methods on model composition only apply to the m...
详细信息
As an important function of a distributed decision support system, model composition aims to aggregate model functions to solve complex decision problems. Most existing methods on model composition only apply to the models which have the same type of input and output data so that they can be linked together directly. Those methods are inadequate for the heterogeneous models, since a heterogeneous model may have different types of input and output data that are represented in either qualitative or quantitative manner. This paper aims to address the problem of heterogeneous model composition by employing the techniques based on semantic web services and artificial intelligence planning. In this paper, the heterogeneous model composition problem is converted to the problem of planning in nondeterministic domains under partial observability. An automatic composition method is presented to generate the composite model based on the planning as model checking technique. The experiment results are also presented in this paper to show the feasibility and capability of our approach in dealing with the complex problems involving heterogeneous models.
This dissertation addresses a growing challenge of visualizing and modifying massive 3D geometric models in a collaborative workspace by presenting a new scalable data partitioning algorithm in conjunction with a robu...
This dissertation addresses a growing challenge of visualizing and modifying massive 3D geometric models in a collaborative workspace by presenting a new scalable data partitioning algorithm in conjunction with a robust system architecture. The goal is to motivate the idea that utilizing a distributed architecture may solve many performance related challenges in visualization of large 3D data. Drawing data from modeling, simulation, interaction and data fusion to deliver a starting point for scientific discovery, we present a collaborative visual analytics framework providing the abilities to render, display and interact with data at a massive scale on high resolution collaborative display environments. This framework allows users to connect to data when it is needed, where it is needed, and in a format suitable for productivity while providing a means to interactively define a workspace that suits one's need. The presented framework uses a distributed architecture to display content on tiled display walls of arbitrary shape, size, and resolution. These techniques manage the data storage, the communication, and the interaction between many processing nodes that make up the display wall. This hides the complexity from the user while offering an intuitive mean to interact with the system. Multi-modal methods are presented that enables the user to interact with the system in a natural way from hand gesture to laser pointer. The combination of this scalable display method with the natural interaction modality provides a robust foundation to facilitate a multitude of visualization and interaction applications. The final output from the system is an image on a large display made up of either projection or lcd based displays. Such a system will have many different components working together in parallel to produce an output. By incorporating computer graphics theory with classical parallelprocessing techniques, performance limitations typically associated with the display
Synchrotron (x-ray) light sources permit investigation of the structure of matter at extremely small length and time scales. Advances in detector technologies enable increasingly complex experiments and more rapid dat...
详细信息
ISBN:
(纸本)9783662480960;9783662480953
Synchrotron (x-ray) light sources permit investigation of the structure of matter at extremely small length and time scales. Advances in detector technologies enable increasingly complex experiments and more rapid data acquisition. However, analysis of the resulting data then becomes a bottleneck-preventing near-real-time error detection or experiment steering. We present here methods that leverage highly parallel computers to improve the performance of iterative tomographic image reconstruction applications. We apply these methods to the conventional per-slice parallelization approach and use them to implement a novel in-slice approach that can use many more processors. To address programmability, we implement the introduced methods in high-performance MapReduce-like computing middleware, which is further optimized for reconstruction operations. Experiments with four reconstruction algorithms and two large datasets show that our methods can scale up to 8K cores on an IBM BG/Q supercomputer with almost perfect speedup and can reduce total reconstruction times for large datasets by more than 95.4% on 32K cores relative to 1K cores. Moreover, the average reconstruction times are improved from similar to 2h (256 cores) to similar to 1min (32K cores), thus enabling near-real-time use.
暂无评论