检索结果-内蒙古大学图书馆

Adaptive data parallel computing on workstation clusters

JOURNAL OF parallel AND DISTRIBUTED computing 2004年第11期64卷 1241-1255页

作者： Mahanti, A Eager, DL Univ Calgary Dept Comp Sci Calgary AB T2N 1N4 Canada Univ Saskatchewan Dept Comp Sci Saskatoon SK S7N 5A9 Canada

Many important parallel applications are data parallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedicated clusters, however, is complicated by the possibility of changes in workstation availability. For example, a personal workstation may be reclaimed by its primary user for interactive use. In such situations, a node must be removed from the collection of workstations forming the "virtual parallel machine" allocated to the application. and data redistributed accordingly. Conversely, workstations may become available to join the virtual parallel machine. This paper identifies fundamental characteristics of efficient policies for data redistribution following addition/removal of workstations front the cluster. The following conclusions are obtained based on mathematical analysis and simulations: (a) allocating data to a new node from the center of the data domain substantially reduces data migration costs compared to allocation from the edge;(b) addition in groups is beneficial compared to repeated single additions;and (c) even a large number of incremental adjustments of the data domain partitions, owing to successive additions/removals of nodes, do not appear to substantially degrade partition quality compared to that obtained by partitioning from scratch. We believe that these observations can be fruitfully incorporated in the design of workstation cluster support systems for data parallel computing. (C) 2004 Elsevier Inc. All rights reserved.

关键词： data parallel computing workstation clusters data redistribution data domain decomposition

来源：评论

学校读者我要写书评

暂无评论

iPACS: Power-aware covering sets for energy proportionality and performance in data parallel computing clusters

引用

JOURNAL OF parallel AND DISTRIBUTED computing 2014年第1期74卷 1762-1774页

作者： Kim, Jinoh Chou, Jerry Rotem, Doron Texas A&M Univ Dept Comp Sci Commerce TX 75429 USA Natl Tsing Hua Univ Dept Comp Sci Hsinchu 30013 Taiwan Univ Calif Berkeley Lawrence Berkeley Natl Lab Sci Data Management Grp Berkeley CA 94706 USA

Energy consumption in datacenters has recently become a major concern due to the rising operational costs and scalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumed by the server nodes must be proportional to the amount of work performed. For data parallelism and fault tolerance purposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A covering subset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In this work, we develop and analyze algorithms to maintain energy proportionality by discovering a covering subset that minimizes energy consumption while placing the remaining nodes in low-power standby mode in a data parallel computing cluster. Our algorithms can also discover covering subset in heterogeneous computing environments. In order to allow more data parallelism, we generalize our algorithms so that it can discover k-covering subset, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results show that we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and working environments. (C) 2013 Elsevier Inc. All rights reserved.

关键词： Energy proportionality data parallel computing Covering subset MapReduce

来源：评论

学校读者我要写书评

暂无评论

AN ASSOCIATIVE data parallel COMPILATION MODEL FOR TIGHT INTEGRATION OF HIGH PERFORMANCE KNOWLEDGE RETRIEVAL AND computing

引用

International Journal on Artificial Intelligence Tools 1994年第1期3卷 97-125页

作者： ARVIND K. BANSAL Department of Mathematics and Computer Science Kent State University Kent OH 44242–0001 USA

Associative Computation is characterized by intertwining of search by content and data parallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, data parallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.

关键词： Artificial intelligence Associative computing data parallel computing High performance Knowledge bases Knowledge retrieval Logic programming

来源：评论

学校读者我要写书评

暂无评论

A data parallel approach to modelling and simulation of large crowd

引用

CLUSTER computing-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2015年第3期18卷 1307-1316页

作者： Yu, Tao Dou, Minggang Zhu, Mao Tsinghua Univ Inst Network Sci & Cyberspace Beijing 100084 Peoples R China China Univ Geosci Sch Comp Sci Wuhan 430074 Peoples R China

The modeling and simulation (M&S) of large crowd has become increasingly important in the domain of public security, such as facility planning, disaster response, and anti-terrorism operations. The behavior of a large crowd is highly complex, and the M&S of a large crowd at the individual level therefore demands the support of a scalable and efficient computing technology. In this study, a method was proposed to formulate crowd behavior with the cell automata and multi-agent models, which were successfully mapped onto the MapReduce programming model. A simulation framework was developed upon Hadoop to simulate large crowd scenarios over a cluster. The simulation process was then transformed to a series of parallel operations on data streams. The simulation studies on a large-scale evacuation scenario had indicated that the simulation framework ensured the simulation process' logic correctness. Experimental results also showed that the Hadoop-based simulation framework could complete five times more tasks while consuming only 19 % CPU time in comparison with the conventional simulation technology.

关键词： data parallel computing Modeling and simulation of complex systems Cloud computing MapReduce Hadoop

来源：评论

学校读者我要写书评

暂无评论

Hybrid image classification and parameter selection using a shared memory parallel algorithm

引用

COMPUTERS & GEOSCIENCES 2007年第7期33卷 875-897页

作者： Phillips, Rhonda D. Watson, Layne T. Wynne, Randolph H. Virginia Polytech Inst & State Univ Dept Comp Sci Blacksburg VA 24061 USA Virginia Polytech Inst & State Univ Dept Math Blacksburg VA 24061 USA Virginia Polytech Inst & State Univ Dept Forestry Blacksburg VA 24061 USA

This work presents a shared memory parallel version of the hybrid classification algorithm IGSCR (iterative guided spectral class rejection) to facilitate the transition from serial to parallel processing. This transition is motivated by a demonstrated need for more computing power driven by the increasing size of remote sensing data sets due to higher resolution sensors, larger study regions, and the like. parallel IGSCR was developed to produce fast and portable code using Fortran 95, OpenMP, and the Hierarchical data Format version 5 (HDF5) and accompanying data access library. The intention of this work is to provide an efficient implementation of the established IGSCR classification algorithm. The applicability of the faster parallel IGSCR algorithm is demonstrated by classifying Landsat data covering most of Virginia, USA into forest and non-forest classes with approximately 90% accuracy. parallel results are given using the SGI Altix 3300 shared memory computer and the SGI Altix 3700 with as many as 64 processors reaching speedups of almost 77. parallel IGSCR allows an analyst to perform and assess multiple classifications to refine parameters. As an example, parallel IGSCR was used for a factorial analysis consisting of 42 classifications of a 1.2 GB image to select the number of initial classes (70) and class purity (70%) used for the remaining two images. (C) 2007 Elsevier Ltd. All rights reserved.

关键词： parallel processing remote sensing landsat forest area high performance computing data parallel computing

来源：评论

学校读者我要写书评

暂无评论

Straggler identification approach in large data processing frameworks using ensembled gradient boosting in smart-cities cloud services

引用

INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT 2022年第SUPPL 1期13卷 146-155页

作者： Deshmukh, Shyam Rao, Komati Thirupathi Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Guntur Andhra Pradesh India

A smart city's efficiency must be achieved by mining large amounts of data generated by cyber-physical systems and electronic platforms using the large-scale data processing framework in cloud environment. Many cloud services rely on data parallel computing frameworks in cloud environment, which runs on hundreds of interconnected nodes. These frameworks divide the computationally intensive and data-intensive tasks into smaller tasks and run them concurrently on different nodes to improve performance. But providing improved performance in the processing environment is a challenge due to runtime variability. Due to different internal and external factors, nodes running these tasks do not perform well, resulting in the delay in the execution of these jobs. As a result of the inherent complexity of runtime variability, preventive measures for stragglers proved inadequate, and the problem continued to affect compute workloads even after the measures were taken. Several researchers proposed dynamic straggler identification approaches based on historical log analysis. This paper analyzes the relationship between several parameters obtained during job execution that will aid us in formulating and detecting the stragglers. Using data analysis, we developed the straggler identification approach and labeled the generated dataset. To achieve high performance using statistical features of historical resource usage, the proposed approach trains distributed XGBoost classifier which showed highest accuracy of 88.57%. Furthermore, we have empirically shown that blacklisting predicted stragglers led to a significant reduction in CPU, I/O, and mixed application execution times.

关键词： data parallel computing Smart cities Spark Straggler identification Ubiquitous computing XGBoost Classifier

来源：评论

学校读者我要写书评

暂无评论

Europe 1992 and its impact on information technology

引用

COMPUTER 1991年第9期24卷 98-99页

作者： Treleaven, Philip Univ Coll London London England

The prospect of a single internal market in 1992 has created a new spirit of adventure in the European Community. It represents a great challenge in all fields of trade and industry, especially information technology,... 详细信息

关键词： Computer Networks Computer Science Object Oriented Programming Operating Systems Computers parallel Programming Research Initiatives Software Prototyping Information Technology MITI computing Research Concurrent Engineering Scientific data Information Systems Superconductivity computing Technology Semiconductor Technology Computer Networks Electronic Components Operating Systems data parallel computing Computer Aided Software Prototyping Object Oriented Software Technology Computer Networks Concurrent computing Computer Industry Europe Information Technology Concurrent Engineering Information Systems Superconductivity Electronic Equipment Testing Semiconductor Device Testing

来源：评论

学校读者我要写书评

暂无评论

Brook for GPUs: Stream computing on graphics hardware

Brook for GPUs: Stream computing on graphics hardware

引用

Annual Symposium of the ACM SIGGRAPH

作者： Buck, I Foley, T Horn, D Sugerman, J Fatahalian, K Houston, M Hanrahan, P Stanford Univ Stanford CA 94305 USA

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

关键词： programmable graphics hardware data parallel computing stream computing GPU computing Brook

来源：评论

学校读者我要写书评

暂无评论

parallel Processing of Massive EEG data with MapReduce

Parallel Processing of Massive EEG Data with MapReduce

引用

IEEE 18th International Conference on parallel and Distributed Systems (ICPADS)

作者： Wang, Lizhe Chen, Dan Ranjan, Rajiv Khan, Samee U. Kolodziej, Joanna Wang, Jun Chinese Acad Sci Ctr Earth Observat & Digital Earth Beijing 100864 Peoples R China China Univ Geosciences Sch Comp Beijing Peoples R China CSIRO ICT Ctr Informat Engn lab Marshfield 2122 WI USA N Dakota State Univ Dept Elect & Comp Engn Grand Forks ND 58201 USA Univ Technol Inst Comp Sci Krakow Poland Univ Cent Florida Dept Elect Engn & Comp Sci Orlando FL 32816 USA

ISBN: (纸本)9781467345651;9780769549033

Analysis of neural signals like electroencephalogram (EEG) is one of the key technologies in detecting and diagnosing various brain disorders. As neural signals are non-stationary and non-linear in nature, it is almost impossible to understand their true physical dynamics until the recent advent of the Ensemble Empirical Mode Decomposition (EEMD) algorithm. The neural signal processing with EEMD is highly compute-intensive due to the high complexity of the EEMD algorithm. It is also data-intensive because 1) EEG signals contain massive data sets 2) EEMD has to introduce a large number of trials in processing to ensure precision. The MapReduce programming mode is a promising parallel computing paradigm for data intensive computing. To increase the efficiency and performance of the neural signal analysis, this research develops parallel EEMD neural signal processing with MapReduce. In this paper, we implement the parallel EEMD with Hadoop in a modern cyberinfrastructure. Test results and performance evaluation show that parallel EEMD can significantly improve the performance of neural signal processing.

关键词： data-intensive computing MapReduce parallel processing parallel PROCESSING (COMPUTERS) signal processing Electroencephalography Brain Diseases Key technology MAPPING(MATHEMATICAL) Signal analysis data parallel computing data flow computing

来源：评论

学校读者我要写书评

暂无评论

Accelerating the Training of HTK on GPU with CUDA

Accelerating the Training of HTK on GPU with CUDA

引用

26th IEEE International parallel and Distributed Processing Symposium (IPDPS) / Workshop on High Performance data Intensive computing

作者： Du, Zhihui Li, Xiangyu Wu, Ji Tsinghua Univ Dept Comp Sci & Technol Tsinhua Natl Lab Informat Sci & Technol Beijing 100084 Peoples R China Beijing Univ Posts & Telecommun Sch Comp Beijing 100876 Peoples R China Tsinghua Univ Tsinghua Natl Lab Informat Sci & Technol Dept Elect Engn Beijing 100084 Peoples R China

ISBN: (纸本)9780769546766

The training procedure of Hidden Markov Model (HMM) based Speech Recognition is often very time consuming because of its high computational complexity. The new parallel hardware like GPU can provide multi-thread processing and very high floating-point capability. We take advantage of GPU to accelerate a popular HMM-based Speech Recognition package - HTK. Based on the sequential code of HTK, we design the "paraTraining", a parallel training model in HTK and develop different optimization methods to improve the performance of HTK on GPU which include unrolling the nested loops and using "reduction add" which can maximize the number of threads per block;using warp mechanism of GPU to reduce synchronizing latency;building different indices of threads to address data efficiently. Experimental results show that about 20+ speedup can be achieved without loss in accuracy. We also discuss the implementation of our method on multi-GPU and got around two times speedup compared with on single-GPU.

关键词： Speech Recognition data parallel computing GPU computing CUDA Stream Processor

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：