Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In...
详细信息
Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In...
详细信息
Multi-Processors Systems-on-Chip (MPSoCs) are demanding for high performance, low power and high density, and therefore, three-dimensional integrated circuits (3DIC) emerge as a solution to integrate these systems. In order to appropriately interconnect the layers of these systems in terms of flexibility and scalability, a Network-on-Chip (NoC) is typically employed. In this paper, we argue about the scenario of 3D designs, covering all important issues about this new concept. In agreement with all features discussed in this paper, we have proposed a hierarchical 3D topology that meets well the reality of these designs. Experimental results analyze different topologies and show the large benefits in area and power of our proposal.
Adaptive Mesh Refinement (AMR) is a widely known technique to adapt the accuracy of a solution in critical areas of the problem domain instead of using regular or irregular but static meshes. The MARE2DEM is a paralle...
Adaptive Mesh Refinement (AMR) is a widely known technique to adapt the accuracy of a solution in critical areas of the problem domain instead of using regular or irregular but static meshes. The MARE2DEM is a parallel application that employs the AMR technique to model 2D electromagnetics in oil and gas exploration. The modeling consists in iteratively applying a data inversion based on a set of measurements collected and registered by a survey on an area of interest. The parallelism of the MARE2DEM works by dividing the workload into a set of refinement groups that represent overlapping areas of the problem domain. Each refinement group can be computed independently of the others by a set of workers, carrying out the AMR in the meshes when necessary. The shape and compute performance of the refinement group depend directly of a set of user-defined parameters. In this article, we provide a method to estimate the MARE2DEM performance for all possible values that can be used in the influencing parameters of the application for a given case study. Our relatively cheap method enables the geologist to configure MARE2DEM correctly and extract the best performance for a given cluster configuration. We detail how the method works and evaluate its effectiveness with success, pinpointing the best values for the creating refinement groups using a real case study from the Marlim field on the coast of Rio de Janeiro, Brazil. Although we demonstrate our evaluation with this scenario, our method works for any input of MARE2DEM.
In parallel programs, the tasks of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they ar...
详细信息
This paper presents an object oriented framework for task scheduling. This framework can be used in domains such as process-centred software engineering environments, workflow management systems or project management ...
详细信息
This paper presents an object oriented framework for task scheduling. This framework can be used in domains such as process-centred software engineering environments, workflow management systems or project management systems. The framework was conceived based on both current methods for the development of frameworks and an existing architectural pattern for process managers. A prototype of the framework was developed using the Java language. The lessons learnt show the experience of extracting a framework from well-known applications that can be reused in practical domains. Thus, this work contributes not only in the production of a framework but also giving insights in the application of novel techniques to the development of frameworks.
Sensor networks are being used in several emerging applications not even imagined some years ago due to advances in sensing, computing, and communication techniques. However, these advances also pose various challenge...
详细信息
The continuous shrinking of devices has introduced new challenges to integrated circuit design, mainly to deal with the parametric variations in process parameters. This paper presents an evaluation of the process var...
详细信息
This paper presents a hardware-friendly algorithm to maximize the throughput of the Discrete Cosine Transform (DCT) of the High Efficiency Video Coding (HEVC), together with its hardware design. The Fast DCT (FCT) alg...
This paper presents a hardware-friendly algorithm to maximize the throughput of the Discrete Cosine Transform (DCT) of the High Efficiency Video Coding (HEVC), together with its hardware design. The Fast DCT (FCT) algorithm is based on the Cooley-Tuckey algorithm for the Fast Fourier Transform (FFT) with pre- and post-processing required to maintain the compliance with the HEVC. The resulting algorithm allows high throughput while maintaining low power dissipation. The designed hardware was synthesized for a 45-nm Nangate technology and it reaches a throughput of 81.28GSamples per second when consuming 12.33mW. Such energy efficiency and throughput surpass all related works in the literature.
The continuous shrinking of devices has introduced new challenges to integrated circuit design, mainly to deal with the parametric variations in process parameters. This paper presents an evaluation of the process var...
详细信息
ISBN:
(纸本)9781467312073
The continuous shrinking of devices has introduced new challenges to integrated circuit design, mainly to deal with the parametric variations in process parameters. This paper presents an evaluation of the process variability on the current Ids of nanotechnologies devices, individually and simultaneously, taking into account the correlation among them. The results show that the deviation from mean value is quite significant ≈ 16% for high performance models. The variation of L has the dominant effect on the overall variation of the device in high performance models while the dominant effect on the overall variation of the device in low power models still being due to Vth variations. Lastly, the effect of process parameter variations deteriorates with technology scaling, with a considerable increase in the deviation from the 22nm to 16nm technology.
In parallel programs, the tasks of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they ar...
详细信息
In parallel programs, the tasks of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they are executing and how the memory hierarchy and interconnection are used. The problem is even more important in multi-core machines with NUMA characteristics, since the remote access imposes high overhead, making them more sensitive to thread and data mapping. In this context, process mapping is a technique that provides performance gains by improving the use of resources such as interconnections, main memory and cache memory. The problem of detecting the best mapping is considered NP-Hard. Furthermore, in shared memory environments, there is an additional difficulty of finding the communication pattern, which is implicit and occurs through memory accesses. This work aims to provide a method for static mapping for NUMA architectures which does not require any prior knowledge of the application. Different metrics were adopted and an heuristic method based on the Edmonds matching algorithm was used to obtain the mapping. In order to evaluate our proposal, we use the NAS Parallel Benchmarks (NPB) and two modern multi-core NUMA machines. Results show performance gains of up to 75% compared to the native scheduler and memory allocator of the operating system.
暂无评论