检索结果-内蒙古大学图书馆

SCALABLE AND OPTIMAL SPEED-UP parallel algorithms FOR TEMPLATE MATCHING ON ARRAYS WITH RECONFIGURABLE OPTICAL BUSES

International Journal of Foundations of Computer Science 2003年第1期14卷 79-98页

作者： CHIN-HSIUNG WU SHI-JINN HORNG Department of Information Management Chinese Naval Academy Kaohsiung Taiwan R. 0. C. Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei Taiwan R. 0. C.

The computational model on which the algorithms are developed is the array with reconfigurable optical buses (AROB). It integrates the advantages of both optical transmission and electronic computation. The main contributions of this paper are in designing several optimal and/or optimal speed-up template matching algorithms with varying degrees of parallelism on the AROB model. For an N × N digitized image and an M × M template, when the domains of the image and the template are O( log N)-bit integers, we first design several basic operations for window broadcasting and rotation. Then based on these basic operations, three efficient and scalable algorithms for template matching are derived using various numbers of processors on a two-dimensional (2-D) or 3-D AROB. For 1 ≤ r ≤ N, 1 ≤ p ≤ M ≤ q ≤ N, one runs in time using r × r processors, another runs in , (resp. ) time using pN × pN/ log M (resp. pN × pN × log N) processors, and the other runs in (resp. ) time using pq × pq/ log M (or pq × pqN × log N) processors, respectively. The latter two algorithms can be tuned to run in O(1) time on a 2-D AROB. To the best of our knowledge, there are no algorithms which can reach this time complexity for this problem on a 2-D array architecture.

关键词： Template matching image processing parallel algorithms pipelined optical bus systems

来源：评论

学校读者我要写书评

暂无评论

parallelization and comparison of local convergent algorithms for solving the Inverse Additive Singular Value Problem

引用

WSEAS Transactions on Mathematics 2006年第1期5卷 81-88页

作者： Flores-Becerra, Georgina Garcia, Victor M. Vidal, Antonio M. Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia Camino de Vera s/n 46022 Valencia Spain Departamento de Sistemas y Computación Instituto Tecnológico de Puebla Av. Tecnológico 420 Colonia Maravillas C.P. 72220 Puebla Mexico

This paper is devoted to the design and evaluation of a parallel version of the algorithm MIII, proposed first by Chu in [7], for the solution of the Inverse Additive Singular Value Problem (IASVP). This new algorithm has shown good experimental performance, confirming the theoretical performance predicted and showing an acceptable scalability. It has been compared with the MI parallel algorithm, described in [10]. Both parallel algorithms decrease the sequential execution time for solving the IASVP and have similar parallel execution times, but, in most cases, MIII is more accurate than MI.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Realizing Multioperations for Step Cached MP-SOCs

Realizing Multioperations for Step Cached MP-SOCs

引用

IEEE International Symposium on System-on-Chip

作者： Martti Forsell Platform Architectures Team Oulu Finland

Recent advances in shared memory multiprocessor system-on-chip (MP-SOC) architectures include using special step caches to efficiently implement concurrent read concurrent write memory access. Unfortunately the existing step cache techniques do not support multioperations that can be used to speed up execution of a number of parallel algorithms by a logarithmic factor. This paper proposed an architectural technique for implementing multioperations on step cached MP-SOCs even if the associativity of caches is limited. The technique is based on simple active memory units, faster memory modules, and small processor-level memory blocks called scratchpads. The performance and area requirements of the proposed technique were evaluated on the parametrical MP-SOC framework. According to the evaluation the technique implements multioperations efficiently and provides a speed-up of 4.8 - 7.2 with respect baseline step cached systems and a speed-up of 3.7- 5.0 with respect to existing non-step cached systems with only a minor silicon are overhead

关键词： Read-write memory Yarn parallel algorithms Memory architecture Multiprocessing systems Silicon Message passing System-on-a-chip Programming profession Hardware

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of Different Kohenen Network parallelization Techniques

Performance Evaluation of Different Kohenen Network Parallel...

引用

International Conference on parallel Computing in Electrical Engineering (PARLEC)

作者： J. Kwiatkowski M. Pawlik U. Markowska-Kaczmar D. Konieczny Institute of Applied Informatics Wroclaw University of Technology Wroclaw Poland

The Kohonen feature maps are commonly employed to process large input data but their effective working abilities can be achieved only after a time-consuming process of learning. Performed tests have shown that the sequential program, solving a typical problem, uses more than 95 percent of its time to localize the winners. The aim of the paper is to present and compare different ways of the algorithm parallelization. We compare two different classes of parallel implementations - the network parallelization and the learning set parallelization. During performed experiments two different ways of experimental evaluation are used: standard evaluation based on such metrics as speedup and efficiency and the approximation method based on the granularity concept

关键词： Performance evaluation parallel processing parallel algorithms Sequential analysis Approximation methods Computer networks Concurrent computing Hardware Power engineering computing Informatics

来源：评论

学校读者我要写书评

暂无评论

Divide and Conquer Strategies for MLP Training

Divide and Conquer Strategies for MLP Training

引用

International Joint Conference on Neural Networks (IJCNN)

作者： S. Bhagat D. Deodhare Department of Computer Science Rutgers University NJ USA Centre for Artificial Intelligence and Robotics Bangalore India

Over time, neural networks have proven to be extremely powerful tools for data exploration with the capability to discover previously unknown dependencies and relationships in the data sets. However, the sheer volume of available data and its dimensionality makes data exploration a challenge. Employing neural network training paradigms in such domains can prove to be prohibitively expensive. An algorithm, originally proposed for supervised on-line learning, has been improvised upon to make it suitable for deployment in large volume, high-dimensional domains. The basic strategy is to divide the data into manageable subsets or blocks and maintain multiple copies of a neural network with each copy training on a different block. A method to combine the results has been defined in such a way that convergence towards stationary points of the global error function can be guaranteed. A parallel algorithm has been implemented on a Linux-based cluster. Experimental results on popular benchmarks have been included to endorse the efficacy of our implementation.

关键词： Neural networks Management training Convergence Space technology Lagrangian functions parallel algorithms Explosives Computer networks Gradient methods Computer science

来源：评论

学校读者我要写书评

暂无评论

Systolic Array Based Adaptive Beamformer Modeling in SystemC Environment

Systolic Array Based Adaptive Beamformer Modeling in SystemC...

引用

NASA/ESA Conference on Adaptive Hardware and Systems (AHS)

作者： O. Tamer A. Ozkurt Elektrik Elektronik Muhendisligi Bolumu Dokuz Eylul Universitesi Izmir Turkey

Optimal weight extraction of beamforming algorithms based on systolic structures have been the subject of various researches since the well-known article presented by Gentleman and Kung (1981) on recursive least squares systolic arrays. Systolic algorithms are parallel and fully pipelined structures, this feature improves the performance of the beamforming algorithms and the system. SystemC is a system design language, which was lately accepted by the IEEE as a standard. SystemC has the advantage of designing both the hardware and the software components together so that the design and simulation process of large systems become easier. This work is based on the simulation of the minimum variance distortionless response (MVDR) beamformer, proposed by Tang, Liu, and Tretter (1994), in SystemC environment and evaluate its performance

关键词： Systolic arrays Signal processing algorithms Array signal processing Hardware Least squares methods Process design parallel algorithms Adaptive arrays Covariance matrix parallel processing

来源：评论

学校读者我要写书评

暂无评论

Toward reliable and efficient message passing software through formal analysis

Toward reliable and efficient message passing software throu...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： G. Gopalakrishnan R. Kirby School of Computing University of Utah Salt Lake UT USA

The quest for high performance drives parallel scientific computing software design. Well over 60% of the high-performance computing (HPC) community writes programs using the MPI library; to gain performance, they are known to perform many manual optimizations. Even tools that accept high level descriptions often generate MPI code, due to its eminent portability. However, since the overall performance of a program does not usually port (due to variations in the target architecture, cluster size, etc.), manual changes to the code are inevitable in today's approaches to MPI programming and optimization. This, together with the vastness and evolving nature of the MPI standard, and the innate complexity of concurrent programming introduces costly bugs. Our research addresses these challenges through specific efforts in the following broad areas: (i) high level expression of the parallel algorithm and compilation thereof into optimized MPI programs, (ii) optimizations of user-written detailed MPI programs through localized transformations such as barrier removal, (iii) formal modeling of complex communication standards, such as the MPI-2 standard and a facility for answering putative queries (this need arises when standard documents are impossibly difficult to manually study in order to answer questions that are not explicitly addressed in the standard), (iv) formal modeling of new (and hence relatively less well understood) features of communication libraries, such as the one-sided communication facility of MPI-2, and (v) formal modeling of intricate control algorithms in these libraries such as the progress engine for TCP and/or shared memory in MPICH2 (a formal model can explicate commonalities, help formally verify, as well as help create better future implementations). Our research gains focus through numerous collaborations

关键词： Message passing Communication standards Scientific computing Software design High performance computing Software libraries Performance gain Computer bugs parallel algorithms Communication system control

来源：评论

学校读者我要写书评

暂无评论

Speeding Up Sequential Simulated Annealing by parallelization

Speeding Up Sequential Simulated Annealing by Parallelizatio...

引用

International Conference on parallel Computing in Electrical Engineering (PARLEC)

作者： Z.J. Czech Silesian University슠of Technology Sosnowiec Poland

A parallel algorithm of simulated annealing to solve the vehicle routing problem with time windows (VRPTW) is considered. The VRPTW is an NP-hard bicriterion optimization problem in which both the number of vehicles and the total distance traveled by vehicles are minimized. The objective is to establish to what extent the computation time required to solve the VRPTW can be decreased by a number of co-operating parallel processes with no loss of quality of solutions. The quality of a solution is meant as its proximity to the optimum (or best known) solution. Furthermore, some factors are proposed which allow to rank the VRPTW benchmarking tests according to their difficulties

关键词： Simulated annealing Vehicles Routing Computational modeling Concurrent computing Benchmark testing parallel algorithms Logistics Procurement Raw materials

来源：评论

学校读者我要写书评

暂无评论

Hierarchically tiled arrays for parallelism and locality 06

Hierarchically tiled arrays for parallelism and locality

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Jia Guo G. Bikshandi D. Hoeflinger G. Almasi B. Fraguela M.J. Garzaran D. Padua C. von Praun University of Illinois Urbana-Champaign USA IBM Thomas J. Watson Research Center Yorktown Heights USA Universidade da Coruña Spain

ISBN: (纸本)9781424400546

parallel programming is facilitated by constructs which, unlike the widely used SPMD paradigm, provide programmers with a global view of the code and data structures. These constructs could be compiler directives containing information about data and task distribution, language extensions specifically designed for parallel computation, or classes that encapsulate parallelism. In this paper, we describe a class developed at Illinois and its Matlab implementation. This class can be used to conveniently express both parallelism and locality. A C++ implementation is now underway. Its characteristics will be reported in a future paper. We have implemented most of the NAS benchmarks using our HTA Matlab extensions and found during that HTAs enable the fast prototyping of parallel algorithms and produce programs that are easy to understand and maintain

关键词： Concurrent computing parallel processing Tiles Computer languages parallel programming Distributed computing parallel algorithms MATLAB Programming profession Yarn

来源：评论

学校读者我要写书评

暂无评论

Fine-Grain parallelization of Recurrent Neural Networks Training

Fine-Grain Parallelization of Recurrent Neural Networks Trai...

引用

International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science

作者： Volodymyr Turchenko Research Institute of Intelligent Computer Systems Department of Information Computing Systems and Control Faculty of Computer Information Technologies Ternopil State Economic University Ternopil Ukraine

An approach to development of fine-grain parallel algorithm of artificial neural network training using parallelization of computational operations of each elementary neuron is presented in this paper. A training algorithm of back error propagation is described and parallel section of the algorithm is developed. The results of experimental research of the parallel algorithm are given using analysis of parallelization speedup and efficiency on parallel computer Origin 300.

关键词： Recurrent neural networks Neurons Neural networks Concurrent computing parallel processing Computer networks parallel algorithms Hardware Artificial neural networks Computer errors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：