Mrs [1] is a lightweight python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would b...
详细信息
ISBN:
(纸本)9781509052219
Mrs [1] is a lightweight python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would benefit from Mrs are iterative algorithms, like those frequently found in machine learning; however, iterative algorithms typically perform poorly in the MapReduce framework, meaning potentially poor performance in Mrs as well. therefore, we propose four modifications to the original Mrs withthe intent to improve its ability to perform iterative algorithms. First, we used direct task-to-task communication for most iterations and only occasionally write to a distributed file system to preserve fault tolerance. Second, we combine the reduce and map tasks which span successive iterations to eliminate unnecessary communication and scheduling latency. third, we propose a generator-callback programming model to allow for greater flexibility in the scheduling of tasks. Finally, some iterative algorithms are naturally expressed in terms of asynchronous message passing, so we propose a fully asynchronous variant of MapReduce. We then demonstrate Mrs' enhanced performance in the context of two iterative applications: particle swarm optimization (PSO), and expectation maximization (EM).
the use of python as a high level productivity language on top of highperformance libraries written in C++ requires efficient, highly functional, and easy-to-use cross-language bindings. C++ was standardized in 1998 ...
详细信息
ISBN:
(纸本)9781509052219
the use of python as a high level productivity language on top of highperformance libraries written in C++ requires efficient, highly functional, and easy-to-use cross-language bindings. C++ was standardized in 1998 and up until 2011 it saw only one minor revision. Since then, the pace of revisions has increased considerably, with a lot of improvements made to expressing semantic intent in interface definitions. For automatic python-C++ bindings generators it is boththe worst of times, as parsers need to keep up, and the best of times, as important information such as object ownership and thread safety can now be expressed. We present cppyy, which uses Cling, the Clang/LLVM-based C++ interpreter, to automatically generate python-C++ bindings for PyPy. Cling provides dynamic access to a modern C++ parser and PyPy brings a full toolbox of dynamic optimizations for highperformance. the use of Cling for parsing, provides up-to-date C++ support now and in the foreseeable future. We show that with PyPy the overhead of calls to C++ functions from python can be reduced by an order of magnitude compared to the equivalent in Cpython, making it sufficiently low to be unmeasurable for all but the shortest C++ functions. Similarly, access to data in C++ is reduced by two orders of magnitude over access from Cpython. Our approach requires no intermediate language and more pythonistic presentations of the C++ libraries can be written in python itself, with little performance cost due to inlining by PyPy. this allows for future dynamic optimizations to be fully transparent.
the proceedings contain 6 papers. the topics discussed include: minimising the execution of unknown bag-of-task jobs with deadlines on the cloud;experiences with performing MapReduce analysis of scientific data on HPC...
ISBN:
(纸本)9781450343527
the proceedings contain 6 papers. the topics discussed include: minimising the execution of unknown bag-of-task jobs with deadlines on the cloud;experiences with performing MapReduce analysis of scientific data on HPC platforms;rethinking highperformancecomputing platforms: challenges, opportunities and recommendations;efficient and scalable workflows for genomic analyses;persistent data staging services for data intensive in-situ scientific workflows;and SIDI: a scalable in-memory density-based index for spatial databases.
Regression testing of HPC systems is of crucial importance when it comes to ensure the quality of service offered to the end users. At the same time, it poses a great challenge to the systems and application engineers...
详细信息
暂无评论