The problem of finding aperiodic low auto-correlation binary sequences (LABS) presents a significant computational challenge, particularly as the sequence length increases. Such sequences have important applications i...
详细信息
DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizat...
详细信息
DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%).
In this paper we examined how the population size affects the performance of the differential evolution algorithm. First, we tested the original differential evolution algorithm, and then the improved self-adaptive di...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods ...
详细信息
In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
MSC Codes 68T20The problem of finding aperiodic low auto-correlation binary sequences (LABS) presents a significant computational challenge, particularly as the sequence length increases. Such sequences have important...
详细信息
In this paper we show how a model of parallel computation called Eduction (tagged demand-driven dataflow) can be implemented on a hypercube. The resulting implementation is called Hyperflow. In addition we will deal w...
详细信息
ISBN:
(纸本)0897912780
In this paper we show how a model of parallel computation called Eduction (tagged demand-driven dataflow) can be implemented on a hypercube. The resulting implementation is called Hyperflow. In addition we will deal with the issue of programability of the hypercube through use of the declarative language Lucid.
This paper presents differential evolution with self-adaptation and local search for constrained multiobjective optimization algorithm (DECMOSA-SQP), which uses the self-adaptation mechanism from DEMOwSA algorithm pre...
详细信息
This paper presents differential evolution with self-adaptation and local search for constrained multiobjective optimization algorithm (DECMOSA-SQP), which uses the self-adaptation mechanism from DEMOwSA algorithm presented at CEC 2007 and a SQP local search. The constrained handling mechanism is also incorporated in the new algorithm. Assessment of the algorithm using CEC 2009 special session and competition on constrained multiobjective optimization test functions is presented. The functions are composed of unconstrained and constrained problems. Their results are assessed using the IGD metric. Based on this metric, algorithm strengths and weaknesses are discussed.
This paper presents performance assessment of differential evolution for multiobjective optimization with self adaptation algorithm, which uses the self adaptation mechanism from evolution strategies to adapt F and ...
详细信息
This paper presents performance assessment of differential evolution for multiobjective optimization with self adaptation algorithm, which uses the self adaptation mechanism from evolution strategies to adapt F and CR parameters of the candidate creation in DE. Results for several runs on CEC2007 special session test functions are presented and assessed with different performance metrics. Based on these metrics, algorithm strengths and weaknesses are discussed.
暂无评论