Very Long Instruction Word architectures (VLIW architectures) can exploit the fine-grained (instruction level) parallelism typically found in sequential-natured program code. A parallelizing compiler is used to restru...
详细信息
We propose a new paradigm for programming tightly coupled multicomputer systems, 2DT. 2DT-programs are composed of local computations on linear data (columns) and global transformations on 2-dimensional combinations o...
详细信息
In this paper, we present a comparative performance evaluation of hot spot effects on the MIN-based and HR-based shared-memory architectures. Analytical models are described for understanding network differences and f...
详细信息
In this paper, we present a comparative performance evaluation of hot spot effects on the MIN-based and HR-based shared-memory architectures. Analytical models are described for understanding network differences and for evaluating hot spot performance on botharchitectures. the analytical comparisons indicate that HR-based architectures have the potential to handle various contentions caused by hot spots more efficiently than MIN-based architectures. Although there is no analytical and experimental evidence that the tree saturation phenomenon occurs in non-blocking MIN architectures, remote accesses to both hot and cool memory modules are considerably slowed down, and overall performance is significantly degraded. Intensive performance measurements on hot spots have been conducted on the BBN TC2000 (MIN-based) and the KSR1 (HR-based) machines. Performance experiments were also conducted on the practical experience of hot spots with respect to synchronization lock and barrier algorithms. the experimental results support the analytical models, and present practical observations and an evaluation of hot spots on the two types of architectures.< >
the apparent dichotomy between symbolic AI processing and distributed neural processing cannot be absolute, since neural networks that capture essential features of human intelligence will also model some of the symbo...
详细信息
ISBN:
(纸本)0818642009
the apparent dichotomy between symbolic AI processing and distributed neural processing cannot be absolute, since neural networks that capture essential features of human intelligence will also model some of the symbolic processes of which humans are capable. Indeed, a primary goal of biological neural network research is to design systems that can self-organize intelligent symbolic processing capabilities. One such system is the ARTMAP family of neural networks. Most if not all of the purported dichotomies between traditional artificial intelligence and neural network research dissolve within these systems. Although ARTMAP systems are neural networks, they are also a type of self-organizing production system capable of hypothesis testing and memory search. they embody continuous and discrete, parallel and serial, and distributed and localized properties. their symbols are compressed, often digital representations, yet they are formed and stabilized through a process of resonant binding that is distributed across the system. they are used to explain and predict data on boththe psychological and the neurobiological levels, yet their unique combinations of computational properties are also rapidly finding their way into technology. they are capable of autonomously discovering rules about the environments to which they adapt, yet these rules are emergent properties of network dynamics rather than formal algorithmic statements.
the quadtree medial axis transform (QMAT) representation of a binary image is a very useful scheme for computer graphics and image processing applications. We present an efficient algorithm for QMAT on the shared memo...
详细信息
the quadtree medial axis transform (QMAT) representation of a binary image is a very useful scheme for computer graphics and image processing applications. We present an efficient algorithm for QMAT on the shared memory EREW-PRAM model. For an image of size n/spl times/n, using n/spl times/n processors, we compute QMAT in O(log n) time. Since image sensors provide image data as a two-dimensional array, a mesh connected computer (MCC) is a popular architecture for image processing applications. Previously known parallelalgorithms for QMAT require O(log/sup 2/ n) and O(log n) time on a pyramid model, and a simulation of these two algorithms takes O(n log n) time on an MCC. However, our algorithm can be executed on an MCC in O(n) time, which is optimal for that model due to the size of its diameter.< >
Nonlinear filters have been used in many signal processing applications, for example, to obtain optimum signal extraction or detection in the presence of random noise. the weighted median filter (WMF), of which the st...
详细信息
Nonlinear filters have been used in many signal processing applications, for example, to obtain optimum signal extraction or detection in the presence of random noise. the weighted median filter (WMF), of which the standard median is a special case, is a novel nonlinear technique designed for 2D image processing. A major advantage of the WMF is its flexibility in design to deal with a wide variety of properties. this paper describes a commonly used class W(4,4,1) of the WMF. AS with most nonlinear methods, the computational demands of this technique are high and require a non-trivial number of "expensive" operations. A data parallel approach for efficient implementation of the WMF is described and implemented on two architecturally dissimilar supercomputers, the Convex C3840 and the Connection Machine CM-200. An analysis of the performance obtained from these two high performance parallel platforms is presented.< >
We present a set of primitive program schemes, which together with just two basic combining forms provide a suprisingly expressive parallel programming language. the primitive program schemes (called tropes) take the ...
详细信息
Barrier synchronization is a commonly used primitive in parallelprocessing, but has traditionally been implemented only on hardware multiprocessors. Withthe growing interest in concurrent computing on general purpos...
详细信息
Barrier synchronization is a commonly used primitive in parallelprocessing, but has traditionally been implemented only on hardware multiprocessors. Withthe growing interest in concurrent computing on general purpose networks, it is worthwhile investigating methods for implementing barriers in such environments. We present different algorithms for barrier synchronization on the widely prevalent multi-access bus network, and derive analytical performance metrics for each of the proposed schemes, which are then compared against simulation results. Our findings indicate that algorithms originally developed for dedicated interconnection networks perform fairly well in shared bus networks with some modifications, and interestingly that the best performance is obtained with a dimensional exchange algorithm.< >
the Memory-Based Reasoning (MBR) has not been explored from the viewpoint of hardware implementation. this paper demonstrates high robustness of MBR, which is suitable for hardware implementation using Wafer Scale Int...
详细信息
ISBN:
(纸本)0780308670
the Memory-Based Reasoning (MBR) has not been explored from the viewpoint of hardware implementation. this paper demonstrates high robustness of MBR, which is suitable for hardware implementation using Wafer Scale Integration (WSI) technology, and proposes a design of WSI-MBR hardware. the robustness is evaluated by a newly developed WSI-MBR simulator in the English pronunciation reasoning task, generally known as MBRTalk. the results show that defects or other fluctuations of device parameters have only minor impacts on the performances of the WSI-MBR. Moreover, it is found that in order to get higher reasoning accuracy, the size of the MBR database is much more crucial than the computation resolution. the proposed WSI-MBR processor takes advantage of benefits discovered in the simulation results. the most area-demanding circuits -that is, multipliers and adders - are designed by analog circuits. It is expected that the 1.7 million processors will be integrated onto the 8-inch silicon wafer by the 0.3 μm SRAM technology.
Data partitioning and mapping is one of the most important steps of writing a parallel program, especially a data parallel one. Recently, Fortran D, and subsequently, High Performance Fortran (HPF) have been proposed ...
详细信息
Data partitioning and mapping is one of the most important steps of writing a parallel program, especially a data parallel one. Recently, Fortran D, and subsequently, High Performance Fortran (HPF) have been proposed to allow users to specify data distributions and alignments for the arrays in their programs. the paper presents the design of the data partitioning module of Fortran 90D compiler that processes the alignment and distribution directives.< >
暂无评论