We give a result that implies an improvement by a factor of log log n in the hypercube bounds for the geometric problems of batched planar point location, trapezoidal decomposition, and polygon triangulation. The impr...
详细信息
We give a result that implies an improvement by a factor of log log n in the hypercube bounds for the geometric problems of batched planar point location, trapezoidal decomposition, and polygon triangulation. The improvements are achieved through a better solution to the multisearch problem on a hypercube, a parallel search problem where the elements in the data structure S to be searched are totally ordered, but where it is not possible to compare in constant time any two given queries q and q'. Whereas the previous best solution to this problem took O(log n(log log n)(3)) time on an n-processor hypercube, the solution given here takes O(log n(log log n)(2)) time on an n-processor hypercube. The hypercube model for which we claim our bounds is the standard one, SIMD, with O(1) memory registers per processor, and with one-port communication. Each register can store O(log n) bits, so that a processor knows its ID.
作者:
Rudd, REBroughton, JQUniv Oxford
Dept Mat Oxford OX1 3PH England SFA Inc
Largo MD 20774 USA USN
Res Lab Complex Syst Theory Branch Washington DC 20375 USA Yale Univ
Sch Management New Haven CT 06520 USA
A strategic objective of computational materials physics is the accurate description of specific materials on length scales spanning the electronic to the macroscopic. We describe progress towards this goal by reviewi...
详细信息
A strategic objective of computational materials physics is the accurate description of specific materials on length scales spanning the electronic to the macroscopic. We describe progress towards this goal by reviewing a seamless coupling of quantum to statistical to continuum mechanics, involving two models, implemented via parallel algorithms on supercomputers, for unifying finite elements (FE), molecular dynamics (MD) and semi-empirical tight-binding (TB). The first approach, FE/MD/TB Coupling of Length Scales (FE/MD/TB CLS), consists of a hybrid model in which simulations of the three scales are run concurrently with the minimal coupling that guarantees physical consistency. The second approach, Coarse-Grained Molecular Dynamics (CGMD), introduces an effective model, a scale-dependent generalization of finite elements which passes smoothly into molecular dynamics as the mesh is reduced to atomic spacing. These methodologies are illustrated and validated using the examples of crack propagation in silicon and the dynamics of micro-resonators. We also briefly review a number of other approaches to multiscale modeling.
An efficient parallel algorithm is presented to find a maximum weight independent set of a permutation graph which takes O(log n) time using O(n(2)/log n) processors on an EREW PRAM, provided the graph has at most O(n...
详细信息
An efficient parallel algorithm is presented to find a maximum weight independent set of a permutation graph which takes O(log n) time using O(n(2)/log n) processors on an EREW PRAM, provided the graph has at most O(n) maximal independent sets. The best known parallel algorithm takes O(log(2) n) time and O(n(3)/log n) processors on a CREW PRAM.
A vector processing method for one-dimensional discrete convolution is presented for high-performance vector digital signal processor (DSP) which has on-chip vector cache and vector shuffling unit. This method fully c...
详细信息
A vector processing method for one-dimensional discrete convolution is presented for high-performance vector digital signal processor (DSP) which has on-chip vector cache and vector shuffling unit. This method fully combines the characteristics of hardware and the basic principle of the algorithm. It changes the process which calculates the result data members in turn into the process which synchronously accumulates values for multiple result data members. In this process, each data of the convolution kernel is extended as a vector by shuffling. This method has concise and clear computing logic. It not only avoids redundant memory access operations and repeated addition and multiplication operations, but also can handle a convolution kernel of an arbitrary length. The experimental results on the FT-M7002-based platform are presented, which show that the average speed-up ratios of the algorithm for single- and double-precision floating-point data reach 3.4 and 7.8, respectively, compared to the corresponding TMS320C66x library function in CCS.
We show that certain multisplitting iterative methods based on overlapping blocks yield faster convergence than corresponding nonoverlapping block iterations, provided the coefficient matrix is an M-matrix. This resul...
详细信息
We show that certain multisplitting iterative methods based on overlapping blocks yield faster convergence than corresponding nonoverlapping block iterations, provided the coefficient matrix is an M-matrix. This result can be used to compare variants of the waveform relaxation algorithm for solving initial value problems. The methods under consideration use the same discretization technique, but are based on multisplittings with different overlaps. Numerical experiments on the Intel iPSC/860 hypercube are included.
作者:
Katti, RSDept. of Electr. Eng.
North Dakota State Univ. Fargo ND USA Abstract Authors References Cited By Keywords Metrics Similar Download Citation Email Print Request Permissions
Automatic detection and correction of errors in the residue number system involves the conversion of residue representations to integers and base extension. The residue number system is generally restricted to moduli ...
详细信息
Automatic detection and correction of errors in the residue number system involves the conversion of residue representations to integers and base extension. The residue number system is generally restricted to moduli that are pairwise relatively prime. In this paper we consider error detection and correction using a moduli set with common factors. A method to construct a moduli set that leads to simplified error detection and correction is presented. Error detection can now be performed by computing residues in parallel. Error correction does not involve base extension any more. It is also shown that, removing all restrictions on the moduli set, leads to more complex error detection/correction algorithms.
Functional Algorithm Simulation is a methodology for predicting the computation and communication characteristics of parallel algorithms for a class of scientific problems, without actually performing the expensive nu...
详细信息
Functional Algorithm Simulation is a methodology for predicting the computation and communication characteristics of parallel algorithms for a class of scientific problems, without actually performing the expensive numerical computations involved. In this paper, we use Functional Algorithm Simulation to study the parallel Fast Multipole Method (FMM), which solves the N-body problem. Functional Algorithm Simulation provides us with useful information regarding communication patterns in the algorithm, the variation of available parallelism during different algorithmic phases, and upper bounds on available speedups for different problem sizes. Furthermore, it allows us to predict the performance of the FMM on message-passing multiprocessors with topologies such as cliques, hypercubes, rings, and multirings, over a wider range of problem sizes and numbers of processors than would be feasible by direct simulation. Our simulations show that an implementation of the FMM on low-cost, scalable ring or multiring architectures can attain satisfactory performance.
In this paper we present several new results in the theory of homogeneous multiprocessor scheduling. We start with some assumptions about the behavior of tasks, with associated precedence constraints, as processor pow...
详细信息
In this paper we present several new results in the theory of homogeneous multiprocessor scheduling. We start with some assumptions about the behavior of tasks, with associated precedence constraints, as processor power is applied. We assume that as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. We also assume that the number of processors which can be assigned to a task is a continuous variable, with a view to exploiting continuous mathematics. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time. These assumptions allow us to recast the optimal scheduling problem in a form which can be addressed by optimal control theory. Various theorems can be proven which characterize the optimal scheduling solution. Most importantly, for the special case where the speedup function of each task is p(alpha), where p is the amount of processing power applied to the task, we can directly solve our equations for the optimal solution. In this case, for task graphs formed from parallel and series connections, the solution can be derived by inspection. The solution can also be shown to be shortest path from the initial to the final state, as measured by an I-1/alpha distance metric, subject to obstacle constraints imposed by the precedence constraints.
CONSPECTUS: The development of more efficient and more accurate ways to represent reactive potential energy surfaces is a requirement for extending the simulation of large systems to more complex systems, longer-time ...
详细信息
CONSPECTUS: The development of more efficient and more accurate ways to represent reactive potential energy surfaces is a requirement for extending the simulation of large systems to more complex systems, longer-time dynamical processes, and more complete statistical mechanical sampling. One way to treat large systems is by direct dynamics fragment methods. Another way is by fitting system-specific analytic potential energy functions with methods adapted to large systems. Here we consider both approaches. First we consider three fragment methods that allow a given monomer to appear in more than one fragment. The first two approaches are the electrostatically embedded many-body (EE-MB) expansion and the electrostatically embedded many-body expansion of the correlation energy (EE-MB-CE), which we have shown to yield quite accurate results even when one restricts the calculations to include only electrostatically embedded dimers. The third fragment method is the electrostatically embedded molecular tailoring approach (EE-MTA), which is more flexible than EE-MB and EE-MB-CE. We show that electrostatic embedding greatly improves the accuracy of these approaches compared with the original unembedded approaches. Quantum mechanical fragment methods share with combined quantum mechanical/molecular mechanical (QM/MM) methods the need to treat a quantum mechanical fragment in the presence of the rest of the system, which is especially challenging for those parts of the rest of the system that are close to the boundary of the quantum mechanical fragment. This is a delicate matter even for fragments that are not covalently bonded to the rest of the system, but it becomes even more difficult when the boundary of the quantum mechanical fragment cuts a bond. We have developed a suite of methods for more realistically treating interactions across such boundaries. These methods include redistributing and balancing the external partial atomic charges and the use of tuned fluorine atom
In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining ...
详细信息
In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.
暂无评论