On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However...
详细信息
ISBN:
(纸本)9780769543123
On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtual-to-Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its perfor- mance. A real DSM based multi-core NoC platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improve- ment depends on problem size, way of data partitioning and computation/ communication ratio of para
The paper deal with 3D graphics simulators history. Several generation of space simulators that were used in training centers including Yu. A. Gagarin Russian State Scientific-Research and Test Centre of Cosmonaut Tra...
详细信息
The paper deal with 3D graphics simulators history. Several generation of space simulators that were used in training centers including Yu. A. Gagarin Russian State Scientific-Research and Test Centre of Cosmonaut Training (Moscow, Russia) are discussed. Number of US-Russia international space station programs use these cosmonaut training system in order to increase the efficiency of crews' space activities, and to ensure safe manned flights. Today we introduce powerful PC-based solution for Virtual Studio. Formal requirements for such a system are declared and its distributed architecture implementation is briefly outlined.
On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organ...
详细信息
On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, netwo
In Multicore Network-on-Chip, it is preferable to realize distributed but shared memory (DSM) in order to reuse the huge amount of legacy code and easy programming. Within DSM systems, memory consistency is a critical...
详细信息
In Multicore Network-on-Chip, it is preferable to realize distributed but shared memory (DSM) in order to reuse the huge amount of legacy code and easy programming. Within DSM systems, memory consistency is a critical issue since it affects not only performance but also the correctness of programs. In this paper, we investigate the scalability of the weak consistency model, which may be implemented using a transaction counter. The experimental results compare synchronization latencies for various network sizes, topologies and lock positions in the network. Average synchronization latency rises exponentially for mesh and torus topologies as the network size grows. However, torus improves the synchronization latency in comparison to mesh. For mesh topology network average synchronization latency is also slightly affected by the lock position with respect to the network center.
This book constitutes the refereed proceedings of the Fourth International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2009, held in Paphos, Cyprus, in January 2009. The 27 revised ful...
详细信息
ISBN:
(数字)9783540929901
ISBN:
(纸本)9783540929895
This book constitutes the refereed proceedings of the Fourth International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2009, held in Paphos, Cyprus, in January 2009. The 27 revised full papers presented together with 2 invited keynote paper were carefully reviewed and selected from 97 submissions. The papers are organized in topical sections on dynamic translation and optimisation, low level scheduling, parallelism and resource control, communication, mapping for CMPs, power, cache issues as well as parallel embedded applications.
With the rapid development of integrated circuit manufacturing processes, soft errors have emerged as a pivotal factor that influences circuit reliability. This paper endeavors to investigate the rapid estimation of t...
详细信息
ISBN:
(数字)9798350352030
ISBN:
(纸本)9798350352047
With the rapid development of integrated circuit manufacturing processes, soft errors have emerged as a pivotal factor that influences circuit reliability. This paper endeavors to investigate the rapid estimation of the impact of the single event upset (SEU) on the logic behaviors of flip-flops in a circuit using machine learning methods. A major challenge currently faced when applying machine learning methods for SEU evaluation is the absence of publicly available circuit datasets. Therefore, this paper employs the fault injection method to acquire circuit data such as soft error sensitivity. Subsequently, it models the gate-level netlist and integrates the netlist models with the acquired data to construct a dataset. Finally, a model based on graph at-tention network (GAT) is developed and we use the leave-one-out cross validation method to evaluate the performance. Compared to neural network methods skilled at handling structured data, the experimental results indicate that the method proposed in this paper has better predictive performance. It achieves an average absolute error of 0.064, representing a 43.46 % improvement over the baseline.
暂无评论