Adaptive Mesh Refinement (AMR) is a widely known technique to adapt the accuracy of a solution in critical areas of the problem domain instead of using regular or irregular but static meshes. The MARE2DEM is a paralle...
Adaptive Mesh Refinement (AMR) is a widely known technique to adapt the accuracy of a solution in critical areas of the problem domain instead of using regular or irregular but static meshes. The MARE2DEM is a parallel application that employs the AMR technique to model 2D electromagnetics in oil and gas exploration. The modeling consists in iteratively applying a data inversion based on a set of measurements collected and registered by a survey on an area of interest. The parallelism of the MARE2DEM works by dividing the workload into a set of refinement groups that represent overlapping areas of the problem domain. Each refinement group can be computed independently of the others by a set of workers, carrying out the AMR in the meshes when necessary. The shape and compute performance of the refinement group depend directly of a set of user-defined parameters. In this article, we provide a method to estimate the MARE2DEM performance for all possible values that can be used in the influencing parameters of the application for a given case study. Our relatively cheap method enables the geologist to configure MARE2DEM correctly and extract the best performance for a given cluster configuration. We detail how the method works and evaluate its effectiveness with success, pinpointing the best values for the creating refinement groups using a real case study from the Marlim field on the coast of Rio de Janeiro, Brazil. Although we demonstrate our evaluation with this scenario, our method works for any input of MARE2DEM.
This paper presents a fast intra mode decision solution for the VVC standard using machine learning. The idea is to reorder the evaluation of modes performed by the Rate-Distortion Optimization (RDO) process according...
This paper presents a fast intra mode decision solution for the VVC standard using machine learning. The idea is to reorder the evaluation of modes performed by the Rate-Distortion Optimization (RDO) process according to the modes occurrence rate. Based on the new evaluation order, three Decision Tree models were trained to skip the modes less likely to be chosen. The results show that the proposed solution achieves time savings of up to 15.57% with coding efficiency degradation of only 0.41% on average. When compared with related works, the proposed solution shows competitive results.
Cryptography hardware design is a key challenge towards the confidentiality advance in the prominent field of the internet of things (IoT). The rise of IoT embedded devices boosts the demand for power- and area- effic...
详细信息
ISBN:
(数字)9781728180588
ISBN:
(纸本)9781728180595
Cryptography hardware design is a key challenge towards the confidentiality advance in the prominent field of the internet of things (IoT). The rise of IoT embedded devices boosts the demand for power- and area- efficient solutions for cryptography hardware. The higher the robustness of the cryptography algorithm is, the higher are the hardware complexity, the circuit area, and energy consumption. Asymmetric algorithms are a particular class widely employed in ultra-secure cryptosystems. The high time-hardness to break the private-key in asymmetric algorithms is a result of its high mathematical complexity. RSA is an asymmetric algorithm that performs successive modular multiplications to encrypt and de-encrypt the information. Therefore, arithmetic operators are the most significant part regarding circuit area and power dissipation. This work evaluates a design space exploration for power- and area-efficient hardware VLSI design in the modular Montgomery multiplier employed in the RSA algorithm.
In this work, we propose a spatially adaptive HEVC intra mode pre-selection for equirectangular (ERP) 360 video coding. The proposed technique exploits the spatial characteristics of 360 video in the ERP projection to...
ISBN:
(数字)9781509066315
ISBN:
(纸本)9781509066322
In this work, we propose a spatially adaptive HEVC intra mode pre-selection for equirectangular (ERP) 360 video coding. The proposed technique exploits the spatial characteristics of 360 video in the ERP projection to reduce the complexity of intra prediction mode selection. The number of intra modes evaluated in Rate-Distortion Optimization is reduced based on a score technique that is adaptive to the frame region being encoded. Results show that the proposed technique achieves a complexity reduction of 16.5% with low coding efficiency penalties.
Software Transactional Memory (STM) is an alternative abstraction to synchronize processes in parallel programming. One advantage is simplicity since it is possible to replace the use of explicit locks with atomic blo...
详细信息
ISBN:
(数字)9781728199245
ISBN:
(纸本)9781728199252
Software Transactional Memory (STM) is an alternative abstraction to synchronize processes in parallel programming. One advantage is simplicity since it is possible to replace the use of explicit locks with atomic blocks. Regarding STM performance, many studies already have been made focusing on reducing the number of aborts. However, in current multicore architectures with complex memory hierarchies, it is also important to consider where the memory of a program is allocated and how it is accessed. This paper proposes the use of a technique called sharing-aware mapping, which maps threads to cores of an application based on their memory access behavior, to achieve better performance in STM systems. We introduce STMap, an online, low overhead mechanism to detect the sharing behavior and perform the mapping directly inside the STM library, by tracking and analyzing how threads perform STM operations. In experiments with the STAMP benchmark suite and synthetic benchmarks, STMap shows performance gains of up to 77% on a Xeon system (17.5% on average) and 85% on an Opteron system (9.1% on average), compared to the Linux scheduler.
Software Transactional Memory (STM) is an abstraction to synchronize accesses to shared resources. It simplifies parallel programming by replacing the use of explicit locks and synchronization mechanisms with atomic b...
详细信息
ISBN:
(数字)9781728189468
ISBN:
(纸本)9781728189475
Software Transactional Memory (STM) is an abstraction to synchronize accesses to shared resources. It simplifies parallel programming by replacing the use of explicit locks and synchronization mechanisms with atomic blocks. A wellknown approach to improve performance of STM applications is to serialize transactions to avoid conflicts using schedulers and mapping algorithms. However, in current architectures with complex memory hierarchies it is also important to consider where the memory of the program is allocated and how it is accessed. An important technique for improving memory locality is to map threads and data of an application based on their memory access behavior. This technique is called sharing-aware mapping. In this paper, we introduce a method to detect sharing behavior directly inside the STM library by tracking and analyzing how threads perform STM operations. This information is then used to perform an optimized mapping of the application's threads to cores in order to improve the efficiency of STM operations. Experimental results with the STAMP benchmarks show performance gains of up to 9.7x (1.4x on average), and a reduction of the number of aborts of up to 8.5x, compared to the Linux scheduler.
This paper presents a hardware-friendly algorithm to maximize the throughput of the Discrete Cosine Transform (DCT) of the High Efficiency Video Coding (HEVC), together with its hardware design. The Fast DCT (FCT) alg...
This paper presents a hardware-friendly algorithm to maximize the throughput of the Discrete Cosine Transform (DCT) of the High Efficiency Video Coding (HEVC), together with its hardware design. The Fast DCT (FCT) algorithm is based on the Cooley-Tuckey algorithm for the Fast Fourier Transform (FFT) with pre- and post-processing required to maintain the compliance with the HEVC. The resulting algorithm allows high throughput while maintaining low power dissipation. The designed hardware was synthesized for a 45-nm Nangate technology and it reaches a throughput of 81.28GSamples per second when consuming 12.33mW. Such energy efficiency and throughput surpass all related works in the literature.
This paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the High Performance Analytics Toolkit (HPAT). The comparison was performed using two applications: a unidimensi...
详细信息
This paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the High Performance Analytics Toolkit (HPAT). The comparison was performed using two applications: a unidimensional vector sum and the K-means clustering algorithm. The experiments were performed in distributed and shared memory environments with different numbers and configurations of virtual machines. By analyzing the results we are able to conclude that HPAT has performance improvements in relation to Apache Spark in our case studies. We independently validated the results and potential presented by the HPAT developers. We also provide an analysis of both frameworks in the presence of failures.
Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In...
详细信息
Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In...
详细信息
Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In order to appropriately interconnect the layers of these systems in terms of flexibility and scalability, a Network-on-Chip (NoC) is typically employed. In this paper, we argue about the scenario of 3D designs, covering all important issues about this new concept. In agreement with all features discussed in this paper, we have proposed a hierarchical 3D topology that meets well the reality of these designs. Experimental results analyze different topologies and show the large benefits in area and power of our proposal.
暂无评论