Machine translation (MT), with its broad potential use, has gained increased attention from both researchers and software vendors. To generate high quality translations, however, MT decoders can be highly computat...
详细信息
Machine translation (MT), with its broad potential use, has gained increased attention from both researchers and software vendors. To generate high quality translations, however, MT decoders can be highly computation intensive. With significant raw computing power, multi-core microprocessors have the potential to speed up MT software on desktop machines. However, retrofitting existing MT decoders is a nontrivial issue. Race conditions and atomicity issues are among those complications making parallelization difficult. In this article, we show that, to parallelize a state-of-the-art MT decoder, it is much easier to overcome such difficulties by using a process-based parallelization method, called functional task parallelism, than using conventional thread-based methods. We achieve a 7.60 times speed up on an 8-core desktop machine while making significantly less changes to the original sequential code than required by using multiple threads.
With further development and wide acceptance of cloud computing, lots of companies and colleges decide to take advantage of it in their own data centers, which is known as private clouds. Since private clouds have som...
详细信息
This paper introduces PartitionSim, a parallel simulator for future thousand-core processors with software-managed cache coherence. The purpose of PartitionSim is to improve the simulation performance of many-core arc...
详细信息
This paper introduces PartitionSim, a parallel simulator for future thousand-core processors with software-managed cache coherence. The purpose of PartitionSim is to improve the simulation performance of many-core architectures at the expense of little accuracy sacrifice. To achieve this goal, we propose a novel technique: timing partition. Timing partition is based on such an observation: in a target system, interacting components communicate with each other and impose simulation synchronization while non-interacting components don't communicate with each other and allow asynchronous simulation. It divides the target timing models into two groups: non-interacting group and interacting group. Non-interacting timing models are simulated by host threads that synchronize little with each other to improve speed and hurt little accuracy, while interacting timing models are simulated by host threads that synchronize strictly with each other to preserve accuracy. Using PartitionSim, We have simulated a target composed of thousands of cores on a 16-core SMP machine. The evaluation results show that PartitionSim scales well with near linear speedup and has considerable performance (up to 25MIPS) at the expense of little accuracy sacrifice (average 0.92%).
三维集成电路是通过硅通孔将多个相同或不同工艺的晶片上下堆叠并进行垂直集成的新兴芯片集成技术。通过这种集成,芯片可获得更小的外形尺寸、更高的片上晶体管集成密度、单片上能集成更多的功能模块以及更高的互连性能等显著优点。然而,三维集成电路也带来了诸如TSV电迁移效应等新挑战。本文提出了一种抑制TSV电迁移效应的可靠性设计方法。首先,针对镀铜气泡、绑定非对齐和绑定界面尘埃沾染等TSV缺陷,分析了制造缺陷和电迁移效应之间的关系。通过观察发现,制造缺陷在加剧电迁移效应的同时还会影响TSV的阻值。然后,本文提出了TSV-SAFE(TSV Self-healing architecture For Electro-migration)可靠性设计框架抑制电迁移效应。实验中,本文构建了一个由两层电路组成的3D芯片仿真平台。实验结果表明,采用本文所提出的技术,TSV的平均无故障时间(MTTF)平均增加了70倍,而由此带来的硬件面积开销不超过全芯片面积的1%。
Li and Zhou propose an important concept for Petri nets: elementary siphons. They partition siphons into elementary and dependent ones. The controllability of the latter can be ensured by the former's proper contr...
详细信息
With the prevalence of multi-core processors, it is a trend that the embedded cluster deploys SMP nodes to gain more computing power. As a crucial issue, the MPI inter-process communication has been suffering the cont...
详细信息
With the prevalence of multi-core processors, it is a trend that the embedded cluster deploys SMP nodes to gain more computing power. As a crucial issue, the MPI inter-process communication has been suffering the contradiction between high performance and embedded constraints. Moreover, there is a big performance gap between intra- and inter-node communication for different infrastructures. In this paper, we design a virtual communication system called SMVN, which extends the shared memory mechanism typically used in intra-node case into the inter-node case. The SMVN utilizes the HT inter-chip interconnect interface in Godson-3A SMP nodes to build a mesh topology. It is Ethernet compatible by simulating bottom layers of TCP/IP protocol. With the design, the node interconnection can get rid of NICs, cables and switches. Furthermore, we exploit the zero-copy scheme and other optimizations to improve the performance. We port the MPICH2 library by socket channel and formulate its process allocation. The MPI latency and bandwidth tests show that the performance difference between two levels is small. The inter-node bandwidth is 27.3 MB/s, which is more than twice the theoretical peak value of 100 Mb Ethernet and reaches 84% of the intra-node performance.
暂无评论