Consider the following operation on an arbitrary positive number: if the number is even, divide it by two, and if the number is odd, triple it and add one. The Collatz conjecture assert that, starting from any positiv...
详细信息
Consider the following operation on an arbitrary positive number: if the number is even, divide it by two, and if the number is odd, triple it and add one. The Collatz conjecture assert that, starting from any positive number n, repeated iteration of the operations eventually produces the value 1. The main contribution of this paper is to present hardware-software cooperative approach to verify the Collatz conjecture. The key idea of our approach is to sieve numbers n that produces 1 using a circuit implemented on an FPGA. The numbers that fail to be verified by overflow are reported to the host PC. The host PC verifies those numbers using unlimited bits operations by software. We have implemented 24 coprocessors on the Vertex II family FPGA XC2V3000-4. The experimental results show that our hardware-software cooperative approach can verify 2.89 times 10 9 64-bit numbers per second.
This paper presents a model based on neural network optimized by the ant colony optimization algorithm (ACOA) for fault section diagnosis in distribution systems of electric power systems, and the simulation results s...
详细信息
This paper presents a model based on neural network optimized by the ant colony optimization algorithm (ACOA) for fault section diagnosis in distribution systems of electric power systems, and the simulation results show that it can effectively improve the fault-tolerance ability of fault section diagnosis. It had better fault-tolerance ability in contrast with the BP-NN model and the DGA-NN model. It must be pointed out that the improvement degree is correlative with the space distribution of samples, and it isn 't the essential improvement, but it is the potential mining of neural network.
We designed and implemented a softwaredistributed shared memory (DSM) system, SCASH-MPI, by using MPI as the communication layer of the SCASH DSM. With MPI as the communication layer, we could use high-speed networks...
详细信息
We designed and implemented a softwaredistributed shared memory (DSM) system, SCASH-MPI, by using MPI as the communication layer of the SCASH DSM. With MPI as the communication layer, we could use high-speed networks with several clusters and high portability. Furthermore, SCASH-MPI can use high-speed networks with MPI, which is the most commonly available communication library. On the other hand, existing software DSM systems usually use a dedicated communication layer, TCP, or UDP-Ethernet. SCASH-MPI avoids the need for a large amount of pin-down memory for shared memory use that has limited the applications of the original SCASH. In SCASH-MPI, a thread is created to support remote memory communication using MPI. An experiment on a 4-node Itanium cluster showed that the Laplace Solver benchmark using SCASH-MPI achieves a performance comparable to the original SCASH. Performance degradation is only 6.3% in the NPB BT benchmark Class B test. In SCASH-MPI, page transfer does not start until a page fault is detected. To hide the latency of page transmission, we implemented a prefetch function. The latency in BT Class B was reduced by 64% when the prefetch function was used.
This paper presents a helper thread prefetching scheme that is designed to work on loosely-coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely-coupled pr...
详细信息
ISBN:
(纸本)9781424400546
This paper presents a helper thread prefetching scheme that is designed to work on loosely-coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely-coupled processors have an advantage in that fine-grain resources, such as processor and L1 cache resources, are not contended by the application and helper threads, hence preserving the speed of the application. However, inter-processor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely-coupled system can be done effectively, we evaluate our prefetching in a standard, unmodified CMP system, and in an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33.
Dependable distributed embedded systems (DDES) are being deployed widely in automobile industry over the world. These systems always post rigorous requirement for timing accuracy and reliability. Both hardware and sof...
详细信息
Dependable distributed embedded systems (DDES) are being deployed widely in automobile industry over the world. These systems always post rigorous requirement for timing accuracy and reliability. Both hardware and software architecture have important effect on the system dependability. Adding or substitute for more reliable hardware could increase the system reliability moreover achieves faster system response. Apparently this would increase the manufacturing cost, while software could be a more cost-effective ways for providing support for dependable distributed embedded system development. The dependable distributed embedded system assessment platform (DDESAP) which based on vehicle control system provides testing and assessment support for various automobile dependable software. DDESAP evaluates the vehicle control hardware and operational environments. software architectures such as time- triggered, event-triggered, hybrid-triggered and other fault tolerant mechanisms for dependable distributed embedded systems were tested on DDESAP. A vehicle dynamic model, a motorway traffic model, and a driver model were developed for DDESAP. Simulation show these models comply with manufacturer and empirical data. DDESAP enables the evaluation of novel software architectures for safety-critical automobile control systems, like the fault tolerant adaptive cruise control systems (ACCS) presented.
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. The system has been successfully deployed on over 200 com...
详细信息
A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. The system has been successfully deployed on over 200 computers, which were distributed over a number of locations, and has been successfully used to process bioinformatics, biomedical engineering, and cryptography applications. We present two bioinformatics applications, DSEARCH, which performs sensitive database and DPRml which performs distributed phylogeny reconstruction by maximum likelihood.
The research on Human pose estimation remains the most fundamental and challenging problem in computer vision. In this context, computer vision-based automobile safety-assisted driving technology has received comprehe...
详细信息
The Tatami project is building a system to support softwareengineering over the Internet, exploiting recent advances in Web technology, interface design and specification. Our effort to improve the usability of such ...
详细信息
The Tatami project is building a system to support softwareengineering over the Internet, exploiting recent advances in Web technology, interface design and specification. Our effort to improve the usability of such systems led us into algebraic semiotics, while our effort to develop better formal methods for distributed concurrent systems led us into hidden algebra. We discuss the Tatami system design, especially user interface issues, and sketch an extension of algebraic semiotics for interface dynamics.
In this paper we present an adaptive version of our previously proposed quality equalizing (QE) load balancing strategy that attempts to maximize the performance of parallel branch-and-bound (B&B) by adapting to a...
详细信息
In this paper we present an adaptive version of our previously proposed quality equalizing (QE) load balancing strategy that attempts to maximize the performance of parallel branch-and-bound (B&B) by adapting to application and target computing system characteristics. Adaptive QE (AQE) incorporates the following salient adaptive features: (1) Anticipatory quantitative and qualitative load balancing mechanisms. (2) Regulation of load information exchange overhead. (3) Deterministic load balancing in extended neighborhoods instead of just immediate neighborhoods as in non-adaptive QE. (4) Randomized global load balancing to fetch work from outside the extended neighborhood. AQE fields speedup improvements of up to 80%, and 15% on the average, compared to that provided by QE for several real-world mixed-integer programming (MIP) problems, and near-ideal speedups for two of the largest problems in the MIPLIB benchmark suite on an IBM SP2 system.
暂无评论