the paper is introducing the principles of a new global optimization strategy, Imperialistic Strategy (IS), applied to the Continuous Global Optimization Problem (CGOP). Inspired from existing multi-population strateg...
详细信息
the paper is introducing the principles of a new global optimization strategy, Imperialistic Strategy (IS), applied to the Continuous Global Optimization Problem (CGOP). Inspired from existing multi-population strategies, like the Island Model (IM) approaches to parallel Evolutionary Algorithms (EA) and the Imperialistic Competitive Algorithm (ICA), the proposed IS method is considered an optimization strategy for the reason that it can integrate other well-known optimization methods, which in the context are regarded as sub-methods (although in other contexts they are prominent global optimization methods). Four optimization methods were implemented and tested in the roles of sub-methods: Genetic Algorithm (GA) (a floating-point representation variant), Differential Evolution (DE), Quantum Particle Swarm Optimization (QPSO) and Artificial Bee Colony (ABC). the optimization performances of the proposed optimization methods were compared on a test bed of 9 known multimodal optimization problems by applying an appropriate testing methodology. the obtained increased success rates of IS multi-population variants compared to the success rates of the optimization sub-methods run separately, combined withthe increased computing efficiencies possible to be perceived for parallel and distributed implementations, demonstrated that IS is a promising approach to CGOP.
Today, almost all computer architectures are parallel and heterogeneous; a combination of multiple CPUs, GPUs and specialized processors. this creates a challenging problem for application developers who want to devel...
详细信息
ISBN:
(纸本)9781450326568
Today, almost all computer architectures are parallel and heterogeneous; a combination of multiple CPUs, GPUs and specialized processors. this creates a challenging problem for application developers who want to develop high performance programs without the effort required to use low-level, architecture specific parallelprogramming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Domain-specific languages (DSLs) are a promising solution to this problem because they can provide an avenue for high-level application-specific abstractions with implicit parallelism to be mapped directly to low level architecture-specific programming models; providing both high programmer productivity and high execution *** this talk I will describe an approach to building high performance DSLs, which is based on DSL embedding in a general purpose programming language, metaprogramming and a DSL infrastructure called Delite. I will describe how we transform DSL programs into efficient first-order low-level code using domain specific optimization, parallelism and locality optimization withparallel patterns, and architecture-specific code generation. All optimizations and transformations are implemented in Delite: an extensible DSL compiler infrastucture that significantly reduces the effort required to develop new DSLs. Delite DSLs for machine learning, data querying, graph analysis, and scientific computing all achieve performance competitive with manually parallelized C++ code.
this talk has two parts. the first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportun...
详细信息
ISBN:
(纸本)9781450326568
this talk has two parts. the first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportunities. this part is based on a 2012 Computing Community Consortium (CCC) whitepaper effort led by Hill, as well as other recent National Academy and ISAT studies. See: http://***/ccc/docs/init/***. the second part of the talk will discuss one or more exam-ples of cross-layer research advocated in the first part. For example, our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory: up to 50% of execution time wasted. Via small changes to the operating system (Linux) and hardware (x86-64 MMU), this work reduces execution time these workloads waste to less than 0.5%. the key idea is to map part of a process's linear virtual address space with a new incarnation of segmentation, while providing compatibility by mapping the rest of the virtual address space with pag-ing.
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell&...
详细信息
ISBN:
(纸本)9781450319225
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell's HLE mechanism to achieve a similar effect to our lock elision scheme entirely in hardware.
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithread...
ISBN:
(纸本)9781450311601
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors;extending a C-like language for portable SIMD programming;DOJ: dynamically parallelizing object-oriented programs;GPU-based NFA implementation for memory efficient high speed regular expression matching;concurrent tries with efficient non-blocking snapshots;deterministic parallel random-number generation for dynamic-multithreading platforms;algorithm-based fault tolerance for dense matrix factorizations;revisiting the combining synchronization technique;FlexBFS: a parallelism-aware implementation of breadth-first search on GPU;optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA;and the boat hull model: adapting the roofline model to enable performance prediction for parallel computing.
In the sciences, it is common to use the so-called "big operator" notation to express the iteration of a binary operator (the reducer) over a collection of values. Such a notation typically assumes that the ...
详细信息
Chase and Lev's concurrent deque is a key data structure in shared-memory parallelprogramming and plays an essential role in work-stealing schedulers. We provide the first correctness proof of an optimized implem...
详细信息
暂无评论