In heterogeneous multi-core systems, performance differences of the cores can affect lock synchronization, where the high-performance cores have to wait for slower cores to complete critical section execution. To bett...
详细信息
In heterogeneous multi-core systems, performance differences of the cores can affect lock synchronization, where the high-performance cores have to wait for slower cores to complete critical section execution. To better utilize the high-performance cores, we can offload critical section execution to high-performance cores. Since combining synchronization has the potential to transfer critical section execution to the combiner, this paper presents a core-aware combining approach for heterogeneous multi-core processors to accelerate critical section execution. In combining synchronization, one competing thread will become the combiner to help complete pending requests. It typically provides better performance than conventional locks on multi-core systems. To enable transferring critical section executions to a more efficient core, we implement the ideas of core efficiency-based selective lock ownership transfer and the dynamic helping quota in four combining implementations. On an aarch64 heterogeneous machine and an x86 asymmetric machine, we ran several micro-benchmarks and workloads to evaluate the performance of our core-aware implementations. The results show that core-aware combining implementations accelerate critical section execution and achieve better throughput than the original combining implementations. (c) 2022 Elsevier Inc. All rights reserved.
Fine-grain thread synchronization has been proved, in several cases, to be outperformed by efficient implementations of the combining technique where a single thread, called the combiner, holding a coarse-grain lock, ...
详细信息
ISBN:
(纸本)9781450311601
Fine-grain thread synchronization has been proved, in several cases, to be outperformed by efficient implementations of the combining technique where a single thread, called the combiner, holding a coarse-grain lock, serves, in addition to its own synchronization request, active requests announced by other threads while they are waiting by performing some form of spinning. Efficient implementations of this technique significantly reduce the cost of synchronization, so in many cases they exhibit much better performance than the most efficient finely synchronized algorithms. In this paper, we revisit the combining technique with the goal to discover where its real performance power resides and whether or how ensuring some desired properties (e.g., fairness in serving requests) would impact performance. We do so by presenting two new implementations of this technique;the first (CC-Synch) addresses systems that support coherent caches, where as the second (DSM-Synch) works better in cacheless NUMA machines. In comparison to previous such implementations, the new implementations (1) provide bounds on the number of remote memory references (RMRs) that they perform, (2) support a stronger notion of fairness, and (3) use simpler and less basic primitives than previous approaches. In all our experiments, the new implementations out perform by far all previous state-of-the-art combining-based and fine-grain synchronization algorithms. Our experimental analysis sheds light to the questions we aimed to answer. Several modern multi-core systems organize the cores into clusters and provide fast communication within the same cluster and much slower communication across clusters. We present an hierarchical version of CC-Synch, called H-Synch which exploits the hierarchical communication nature of such systems to achieve better performance. Experiments show that H-Synch significantly outpperforms previous state-of-the-art hierarchical approaches. We provide new implementations of co
Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons...
详细信息
ISBN:
(纸本)9780769529479
Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
暂无评论