Energy consumption optimization of HPC applications inherently requires measurements for reference and comparison. However, most of today's systems lack the necessary hardware support for power or energy measureme...
详细信息
Energy consumption optimization of HPC applications inherently requires measurements for reference and comparison. However, most of today's systems lack the necessary hardware support for power or energy measurements. Furthermore, in-band data availability is preferred for specific optimization techniques such as auto-tuning. For this reason, we present in-band energy consumption models for the IBM POWER7 processor based on hardware counters. We demonstrate that linear regression is a suitable means for modeling energy consumption, and we rely on already available, high-level benchmarks for training instead of self-written or hand-tuned micro-kernels. We compare modeling efforts for different instruction mixes caused by two compilers (GCC and IBM XL) as well as various multi-threading usage scenarios, and validate across our training benchmarks and two real-world applications. Results show mean errors of approximately 1% and overall max errors of 5.3% for GCC.
Corner detection is an extremely important technique in image recognition, which is widely employed in various applications for image recognition. With the widespread use of mobile devices, image recognition technique...
详细信息
Cloud computing service make possible applications by given that visualized resources that can be energetically allocated to virtual clusters. Nowadays IT companies and business companies make use of cloud environment...
详细信息
With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processingtechniques, which can only process graphs that fit completel...
详细信息
ISBN:
(纸本)9781509066070
With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processingtechniques, which can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. Contribution. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into subgraphs that fit in RAM and uses a paging-like technique to load subgraphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.
The proceedings contain 51 papers. The special focus in this conference is ADBIS Short Contributions, Special Session on Big Data: New Trends and applications, The Second international Workshop on GPUs in Databases, T...
ISBN:
(纸本)9783319018621
The proceedings contain 51 papers. The special focus in this conference is ADBIS Short Contributions, Special Session on Big Data: New Trends and applications, The Second international Workshop on GPUs in Databases, The Second international Workshop on Ontologies Meet Advanced Information Systems, The First international Workshop on Social Business Intelligence: Integrating Social Content in Decision Making and Doctoral Consortium. The topics include: New Trends in databases and information systems;New trends in databases and information systems;New ontological alignment system based on a non-monotonic description logic;spatiotemporal co-occurrence rules;An efficient spatial access method for highly redundant point data;Labeling association rule clustering through a genetic algorithm approach, Time series queries processing with GPU support;Rule-based multi-dialect infrastructure for conceptual problem solving over heterogeneous distributed information resources;distributedprocessing of Xpath queries using mapreduce;A Query language for workflow instance data;When too similar is bad;Viable systems model based information flows;On materializing paths for faster recursive querying;XSLTmark II - a simple, extensible and portable XSLT benchmark;ReMoSSA;DSD;Designing parallel relational data warehouses;Big data new frontiers;Extraction, sentiment analysis and visualization of massive public messages;Desidoo, a big-data application to join the online and real-world marketplaces;GraphDB - storing large graphs on secondary memory;Hadoop on a low-budget general purpose hpc cluster in academia and Discovering contextual association rules in relational databases.
In this paper, an adaptive architecture for dynamic management and allocation of on-chip FPGA Block Random Access Memory (BRAM) resources is presented. This facilitates the dynamic sharing of valuable and scarce on-ch...
详细信息
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applicat...
详细信息
ISBN:
(纸本)9781479938018
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the possible sources of non-reproducibility. In particular, we consider the tasks of evaluating transcendental functions and performing reductions using non-associative operators. We present a set of techniques to achieve reproducibility and we propose improvements over existing algorithms to perform reproducible computations in a portable way, at the same time obtaining good performance and accuracy. By applying these techniques to more complex tasks we show that bit-reproducibility can be achieved on a broad range of scientific applications.
The potential advantages of optical CDMA (OCDMA) over other multi access techniques attracted considerable interest over the past decade. All-optical implementations of OCDMA are often considered to be too complex to ...
详细信息
ISBN:
(纸本)9781479969401
The potential advantages of optical CDMA (OCDMA) over other multi access techniques attracted considerable interest over the past decade. All-optical implementations of OCDMA are often considered to be too complex to implement especially for cost-sensitive applications. On the other hand, electronic realization of OCDMA encoders/ decoders are more cost effective but its throughput is restricted by the electronic processing bottleneck. In this paper, we propose an electronic implementation of OCDMA transceivers that takes advantage of newly emerging high-speed FPGA transceiver technologies, which offer speeds of up to 28 Gb/s. Though the chip rate of the proposed OCDMA system can approach such high speeds, the encoder/decoder processing can operate at much lower speed thanks to the parallel architecture that allows chips to be processed simultaneously. The proposed design benefits from its low resource utilization, which makes it suitable for smaller, affordable FPGA devices. This is in addition to the reconfigurability of FPGA devices, which further reduce the overall system cost.
A vision chip is a high-speed and compact vision system that integrates an image sensor and parallel image processors on a single silicon die. Nowadays, high-speed vision chips with powerful recognition capabilities a...
详细信息
A new tool and web portal are presented for deployment of High Performance Computing applications on distributed heterogeneous computing platforms. This tool relies on the decentralized environment P2PDC and the OMF a...
详细信息
A new tool and web portal are presented for deployment of High Performance Computing applications on distributed heterogeneous computing platforms. This tool relies on the decentralized environment P2PDC and the OMF and OML multithreaded control, instrumentation and measurement libraries. Deployment on PlanetLab of a numerical simulation application is studied. A first series of computational results is displayed and analyzed.
暂无评论