Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, w...
详细信息
ISBN:
(纸本)9783642233975
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach. We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. the tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. this work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote fi...
ISBN:
(纸本)3642131352
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote files;query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation;a high efficient on-chip interconnection network in SIMD CMPs;design of a slot assignment scheme for link error distribution on wireless grid networks;dynamic resource tuning for flexible core chip multiprocessors;a grid based system for closure computation and online service;parallel domain decomposition methods for high-order finite element solutions of the Helmholtz problem;frequencies;quick forwarding of queries to relevant peers in a hierarchical P2P file search system;cluster-fault-tolerant routing in burnt pancake graphs;and edge-bipancyclicity of all conditionally faulty hypercubes.
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote fi...
ISBN:
(纸本)3642131182
the proceedings contain 95 papers. the topics discussed include: scalable co-clustering algorithms;parallel prefix computation in the recursive dual-net;a two-phase differential synchronization algorithm for remote files;query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation;a high efficient on-chip interconnection network in SIMD CMPs;design of a slot assignment scheme for link error distribution on wireless grid networks;dynamic resource tuning for flexible core chip multiprocessors;a grid based system for closure computation and online service;parallel domain decomposition methods for high-order finite element solutions of the Helmholtz problem;frequencies;quick forwarding of queries to relevant peers in a hierarchical P2P file search system;cluster-fault-tolerant routing in burnt pancake graphs;and edge-bipancyclicity of all conditionally faulty hypercubes.
this paper deals withthe problem of microphone array speech enhancement using a hybrid Generalized Sidelobe Canceller (GSC) and Super-Directive Beamformer (SDB). While GSC is one of the most practical architectures f...
详细信息
In this paper, an N-CSK (N parallel Codes Shift Keying) using modified pseudo orthogonal M-sequence sets (MPOMS) for applying the parallel combinatory spread-spectrum (PC/SS) communication system to the optical commun...
详细信息
We consider the fast acquisition problem in magnetic resonance imaging (MRI). Often, fast acquisition is achieved using parallel imaging (pMRI) techniques. It has been shown recently that compressed sensing (CS), whic...
详细信息
Graphics processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. this task i...
详细信息
Spam continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) based techniques have been proposed for spam classification. However, SVM training is a computationally intensive p...
详细信息
We propose the creation and use of a multilingual parallel news corpus annotated with opinion towards entities, produced by projecting sentiment annotation from one language to several others. the objective is to save...
详细信息
We propose the creation and use of a multilingual parallel news corpus annotated with opinion towards entities, produced by projecting sentiment annotation from one language to several others. the objective is to save annotation time for development and evaluation purposes, and to guarantee comparability of opinion mining evaluation results across languages. By creating this resource, we answered the question whether sentiment is consistently translated across languages so that projection can actually be an option. We describe our approach to multilingual sentiment analysis and show its performance in 7 languages of the parallel corpus.
the pre-computation of data cubes is critical for improving the response time of OLAP(On-Line Analytical processing) systems. In order to meet the need for improved performance created by growing data sizes, parallel ...
详细信息
ISBN:
(纸本)9780769546001
the pre-computation of data cubes is critical for improving the response time of OLAP(On-Line Analytical processing) systems. In order to meet the need for improved performance created by growing data sizes, parallel solutions for data cube construction are becoming increasingly important. this paper presents two parallel methods for data cube construction based on an extendible multidimensional array, which is dynamically extendible along any dimension without relocating any existing data. We have implemented and evaluated our core-based parallel data cube construction methods on shared-memory multiprocessors. Given the performance limit, the methods achieve close to linear speedup with load balance. Our experiments also indicate that our parallel methods can be more scalable on higher dimensional data cube construction.
暂无评论