检索结果-内蒙古大学图书馆

29th IEEE International parallel and distributed processing Symposium (IPDPS)

作者： March, William B. Xiao, Bo Yu, Chenhan D. Biros, George Univ Texas Austin Inst Computat Engn & Sci Austin TX 78712 USA

ISBN: (纸本)9781479986484

We present a parallel treecode for fast kernel summation in high dimensions-a common problem in data analysis and computational statistics. Fast kernel summations can be viewed as approximation schemes for dense kernel matrices. Treecode algorithms (or simply treecodes) construct low-rank approximations of certain off-diagonal blocks of the kernel matrix. These blocks are identified with the help of spatial data structures, typically trees. There is extensive work on treecodes and their parallelization for kernel summations in three dimensions, but there is little work on high-dimensional problems. Recently, we introduced a novel treecode, ASKIT, which resolves most of the shortcomings of existing methods. We introduce novel parallel algorithms for ASKIT, derive complexity estimates, and demonstrate scalability on synthetic, scientific, and image datasets. In particular, we introduce a local essential tree construction that extends to arbitrary dimensions in a scalable manner. We introduce data transformations for memory locality and use GPU acceleration. We report results on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center. Our largest computations involve two billion points in 64 dimensions on 32,768 x86 cores and 8 million points in 784 dimensions on 16,384 x86 cores.

关键词： GPUs Kernel summation randomized linear algebra treecodes

来源：评论

学校读者我要写书评

暂无评论

GPU-accelerated Digital Halftoning by the Local Exhaustive Search 14

GPU-accelerated Digital Halftoning by the Local Exhaustive S...

引用

IEEE 14th International Symposium on parallel and distributed Computing (ISPDC)

作者： Kouge, Hiroaki Ito, Yasuaki Nakano, Koji Hiroshima Univ Dept Informat Engn Kagamiyama 1-4-1 Higashihiroshima 7398527 Japan

ISBN: (纸本)9781467371483

The main contribution of this paper is to show a new GPU implementation for the digital halftoning by the local exhaustive search that can generate high quality binary images. We have considered programming issues of the GPU architecture to implement these two methods on the GPU. The experimental result shows that our GPU implementation for the local exhaustive search on NVIDIA GeForce GTX 980 for a 512x512 gray scale image runs in 732 seconds, while the CPU implementation runs in 37,364 seconds. Thus, our GPU implementation attains a speed-up factor of 50.98. Additionally, we also propose a GPU implementation for the digital halftoning by the partial exhaustive search of which the search space of the local exhaustive search is reduced. Similarly, we can accelerate the computation of the partial exhaustive search 30.73 times faster.

关键词： image processing Digital halftoning GPGPU Local exhaustive search Partial exhaustive search

来源：评论

学校读者我要写书评

暂无评论

Enabling factor analysis on thousand-subject neuroimaging datasets

Enabling factor analysis on thousand-subject neuroimaging da...

引用

IEEE International Conference on Big Data

作者： Michael J. Anderson Mihai Capota Javier S. Turek Xia Zhu Theodore L. Willke Yida Wang Po-Hsuan Chen Jeremy R. Manning Peter J. Ramadge Kenneth A. Norman Parallel Computing Lab Intel Corporation Hillsboro Oregon Princeton University Princeton New Jersey Dartmouth College Hanover New Hampshire

ISBN: (纸本)9781467390064

The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and the Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99χ and 2062x speedups on the two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5χ respectively with 20 nodes on real datasets. We demonstrate weak scaling on a synthetic dataset with 1024 subjects, equivalent in size to the biggest fMRI dataset collected until now, on up to 1024 nodes and 32,768 cores.

关键词： Optimization Computational modeling Neuroimaging Brain modeling Algorithm design and analysis Data models

来源：评论

学校读者我要写书评

暂无评论

Map-Reduce Based Framework for Instrument Detection in Large-Scale Surgical Videos

Map-Reduce Based Framework for Instrument Detection in Large...

引用

International Conference on Control Communication & Computing India (ICCC)

作者： Sunny, Basil C. Ramesh, R. Varghese, Abraham Vazhayil, Vikas ASIET Dept CSE Ernakulam Kerala India NIMHANS Dept Neurosurg Bangalore Karnataka India

ISBN: (纸本)9781467373494

Modern digital world produces massive amount of data generally refereed as Big Data, which play important roles in dictating the quality of our lives. Relationships among such data have high value, but extremely complex task to establish. Medical Field is one of the major big data sources which produces big volume of data. Modern surgical tools have the capability to record High Definition(HD) videos during the surgical procedure which enables post surgical reviews. Such tools produces giga bytes(GB) of video footage after every surgery which needs mass storage and complex processing. A major solution for this problem is parallel distributed processing using Hadoop based Map Reduce Framework. This paper proposes a Surgical Video Analysis Framework using Hadoop to analyze large surgical videos, for identifying surgical instruments used. Framework first converts videos into large number of frames and using Hadoop image Process Interface (HIPI) it is converted to HIB image bundles. parallel processing of images in the bundle is done by mappers and identified instrument frame information's are logged. Three different feature extraction methods: Scale-Invariant Feature Transform(SIFT), Speeded Up Robust Features(SURF) with Support Vector Machines(SVM) and Haralick Texture Descriptor with Support Vector Machines(SVM) is used in mappers for local image processing.

关键词： Hadoop Mapreduce HIPI Big data distributed processing

来源：评论

学校读者我要写书评

暂无评论

INFORMATION-ENTROPY BASED LOAD BALANCING IN parallel ADAPTIVE VOLUME RENDERING

INFORMATION-ENTROPY BASED LOAD BALANCING IN PARALLEL ADAPTIV...

引用

International Conference on Interfaces and Human Computer Interaction / International Conference on Game and Entertainment Technologies / International Conference on Computer Graphics, Visualization, Computer Vision and image processing

作者： Wang, Huawei Ai, Zhiwei Cao, Yi Inst Appl Phys & Computat Math 2 Fenghao East Rd Beijing 100094 Peoples R China

ISBN: (纸本)9789898533388

Aiming at TB-scale time-varying scientific datasets, this paper presents a novel static load balancing scheme based on information entropy to enhance the efficiency of the parallel adaptive volume rendering algorithm. An information-theory model is proposed firstly, and then the information entropy is calculated for each data patch, which is taken as a pre-estimation of the computational amount of ray sampling. According to their computational amounts, the data patches are distributed to the processing cores balancedly, and accordingly load imbalance in parallel rendering is decreased. Compared with the existing methods such as random assignment and ray estimation, the proposed entropy-based load balancing scheme can achieve a rendering speedup ratio of 1.23 similar to 2.84. It is the best choice in interactive volume rendering due to its speedup performance and view independence.

关键词： Load Balancing Information Entropy parallel Volume Rendering View Independence

来源：评论

学校读者我要写书评

暂无评论

分布式深度学习框架的优化方法

分布式深度学习框架的优化方法

引用

作者：漆鹏中国科学院信息工程研究所

学位级别：硕士

深度学习技术是目前计算机科学中最火的技术，在图像识别、语音处理等应用领域中取得了巨大的成果。随着深度学习技术的推广与应用，越来越多的深度学习框架涌现出来，例如Caffe、TensorFlow等。这些深度学习框架简单易用，为深度学习... 详细信息

深度学习技术是目前计算机科学中最火的技术，在图像识别、语音处理等应用领域中取得了巨大的成果。随着深度学习技术的推广与应用，越来越多的深度学习框架涌现出来，例如Caffe、TensorFlow等。这些深度学习框架简单易用，为深度学习技术相关工作人员提供了极大的便利与帮助。然而随着深度学习模型与训练数据集规模的日益增长，单机深度学习框架难以快速地完成模型训练工作。因此深度学习框架开始支持分布式训练。这类框架采用模型并行的训练模式以处理更大规模的模型，改进训练方法以提高模型的训练效率。通过对分布式深度学习框架中模型并行与训练方法现状进行研究，本文总结发现: 模型并行的训练模式中尚无通用的模型划分方法，主要以人工经验划分为主，模型并行的加速比较低；同时新近提出的随机拟牛顿法存在步长选择困难的问题，容易导致模型训练不稳定，鲁棒性低。针对这两类问题，本文提出两个研究目标： 1. 设计一种通用的模型划分方法以提高模型并行的加速比； 2. 设计一种自适应的随机拟牛顿法，解决其步长选择困难问题，以提高鲁棒性。根据两个研究目标，本文主要做了以下三方面的工作：首先，通过结合两个研究目标与现有的分布式深度学习框架优点，本文设计了一种基于参数服务器的分布式深度学习框架。该框架是由参数服务器和Worker Group两层结构组成，其中参数服务器存储全局参数，Worker Group为实际的模型训练单元。在模型训练过程中Worker Group需要同参数服务器进行全局参数同步。通过分析全局参数同步的通信特点，本文提出了一种贪心全局参数同步通信策略。其次，针对模型并行中人工经验划分困难的问题，本文以流水线加速为出发点，提出了一个模型划分的最优化问题，采用动态规划方法完成对该最优化问题的求解。为了获取最优化问题求解中的输入信息，本文结合硬件情况预估了训练过程中深度学习模型的计算量与内存占用情况。实验表明该模型划分方法较人工经验划分方法具有更高的加速比。最后，本文通过分析随机拟牛顿法中不稳定的原因，设计了一种自适应的步长选择方法。在此基础上，提出了一种自适应的随机拟牛顿方法。本文理论证明了该方法具有全局收敛性，并分析出其鲁棒性高的原因。通过实验验证了该方法在凸优化与非凸优化的两类问题上训练效率高，鲁棒性。

关键词：深度学习框架模型并行随机拟牛顿法收敛分析同步通信策略

来源：评论

学校读者我要写书评

暂无评论

Automatic Composition of Heterogeneous Models Based on Semantic Web Services

引用

INTERNATIONAL JOURNAL OF parallel PROGRAMMING 2015年第3期43卷 339-358页

作者： Huang, Hui He, Ligang Chen, Xueguang Yu, Minghui Wang, Zhiwu Huazhong Univ Sci & Technol Dept Control Sci & Engn Wuhan 430074 Peoples R China Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England Henan Inst Engn Dept Comp Sci & Engn Zhengzhou 451191 Peoples R China

As an important function of a distributed decision support system, model composition aims to aggregate model functions to solve complex decision problems. Most existing methods on model composition only apply to the models which have the same type of input and output data so that they can be linked together directly. Those methods are inadequate for the heterogeneous models, since a heterogeneous model may have different types of input and output data that are represented in either qualitative or quantitative manner. This paper aims to address the problem of heterogeneous model composition by employing the techniques based on semantic web services and artificial intelligence planning. In this paper, the heterogeneous model composition problem is converted to the problem of planning in nondeterministic domains under partial observability. An automatic composition method is presented to generate the composite model based on the planning as model checking technique. The experiment results are also presented in this paper to show the feasibility and capability of our approach in dealing with the complex problems involving heterogeneous models.

关键词： Decision support systems Model management distributed model composition Planning as model checking Semantic web service

来源：评论

学校读者我要写书评

暂无评论

A distributed Architecture for Interactive Scene Navigation, Modification, and Collaboration on Multi-Display Walls

A Distributed Architecture for Interactive Scene Navigation,...

引用

作者： Lai, Duy-Quoc University of California Irvine

学位级别：Ph.D.

This dissertation addresses a growing challenge of visualizing and modifying massive 3D geometric models in a collaborative workspace by presenting a new scalable data partitioning algorithm in conjunction with a robust system architecture. The goal is to motivate the idea that utilizing a distributed architecture may solve many performance related challenges in visualization of large 3D data. Drawing data from modeling, simulation, interaction and data fusion to deliver a starting point for scientific discovery, we present a collaborative visual analytics framework providing the abilities to render, display and interact with data at a massive scale on high resolution collaborative display environments. This framework allows users to connect to data when it is needed, where it is needed, and in a format suitable for productivity while providing a means to interactively define a workspace that suits one's need. The presented framework uses a distributed architecture to display content on tiled display walls of arbitrary shape, size, and resolution. These techniques manage the data storage, the communication, and the interaction between many processing nodes that make up the display wall. This hides the complexity from the user while offering an intuitive mean to interact with the system. Multi-modal methods are presented that enables the user to interact with the system in a natural way from hand gesture to laser pointer. The combination of this scalable display method with the natural interaction modality provides a robust foundation to facilitate a multitude of visualization and interaction applications. The final output from the system is an image on a large display made up of either projection or lcd based displays. Such a system will have many different components working together in parallel to produce an output. By incorporating computer graphics theory with classical parallel processing techniques, performance limitations typically associated with the display

关键词：

来源：评论

学校读者我要写书评

暂无评论

Rapid Tomographic image Reconstruction via Large-Scale parallelization 21st

Rapid Tomographic Image Reconstruction via Large-Scale Paral...

引用

21st International Conference on parallel and distributed Computing (Euro-Par)

作者： Bicer, Tekin Gursoy, Doga Kettimuthu, Rajkumar De Carlo, Francesco Agrawal, Gagan Foster, Ian T. Argonne Natl Lab Math & Comp Sci Div Lemont IL 60439 USA Argonne Natl Lab Xray Sci Div Lemont IL USA Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Univ Chicago Dept Comp Sci Chicago IL 60637 USA

ISBN: (纸本)9783662480960;9783662480953

Synchrotron (x-ray) light sources permit investigation of the structure of matter at extremely small length and time scales. Advances in detector technologies enable increasingly complex experiments and more rapid data acquisition. However, analysis of the resulting data then becomes a bottleneck-preventing near-real-time error detection or experiment steering. We present here methods that leverage highly parallel computers to improve the performance of iterative tomographic image reconstruction applications. We apply these methods to the conventional per-slice parallelization approach and use them to implement a novel in-slice approach that can use many more processors. To address programmability, we implement the introduced methods in high-performance MapReduce-like computing middleware, which is further optimized for reconstruction operations. Experiments with four reconstruction algorithms and two large datasets show that our methods can scale up to 8K cores on an IBM BG/Q supercomputer with almost perfect speedup and can reduce total reconstruction times for large datasets by more than 95.4% on 32K cores relative to 1K cores. Moreover, the average reconstruction times are improved from similar to 2h (256 cores) to similar to 1min (32K cores), thus enabling near-real-time use.

关键词： Supercomputers

来源：评论

学校读者我要写书评

暂无评论

基于Hadoop平台的图像检索模型

引用

计算机时代 2016年第12期 35-38页

作者：王柏翔姜佳鹏华中科技大学电信系湖北武汉430074 河南科技大学信息工程学院

处理海量图像数据问题时,针对传统的图像检索方法计算速度慢、检索效率低的问题,借助HBase和Hadoop分布式技术对海量数据超强的读写能力,提出一种适用于大数据背景下的基于Hadoop平台的图像检索模型。该图像检索模型提供了图像数据处理... 详细信息

处理海量图像数据问题时,针对传统的图像检索方法计算速度慢、检索效率低的问题,借助HBase和Hadoop分布式技术对海量数据超强的读写能力,提出一种适用于大数据背景下的基于Hadoop平台的图像检索模型。该图像检索模型提供了图像数据处理的可序列化数据类型,并通过输入模块实现大数据背景下数据类型的转换及海量图像的输入。理论验证了该模型为特征提取、图像处理等并行处理算法提供了可行的方案。

关键词：图像检索 HBase Hadoop 并行处理

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：