Modern nonvolatile memories (NVMs) are widely recognized as energy-efficient replacements of classical memory/storage media, such as SRAM, DRAM, and mechanical hard disk. Among the popular NVMs, the skyrmion racetrack...
详细信息
Modern nonvolatile memories (NVMs) are widely recognized as energy-efficient replacements of classical memory/storage media, such as SRAM, DRAM, and mechanical hard disk. Among the popular NVMs, the skyrmion racetrack memory (SK-RM) is well known for its high storage density and unique supports of insert/delete operations. However, the existing algorithms designed for classical media might experience serious performance degradation when working on the SK-RM, due to the distinct characteristics of SK-RM. Thus, the existing algorithms should be redesigned to adapt to the brand-new memory model based on the SK-RM, so as to fully reveal the potentials of SK-RM. In particular, many existing algorithms tend to access the in-memory data in a random-hopping fashion, which generates many time-consuming shift operations of SK-RM. It is therefore crucial for the existing algorithms to eliminate unnecessary shift operations of SK-RM to boost the performance of the algorithms. In many modern applications, such as multimedia and data analysis, it is a common operation to process two or more arrays/vectors of data to perform certain computation tasks. In the arrays/vectors, an appropriate data placement strategy is critical for avoiding unnecessary shift operations of SK-RM. The observation thus motivates this work in proposing a recursive back-to-back data placement manner to effectively reduces the shift operations of SK-RM. To demonstrate the back-to-back data placement, we take sorting algorithms as a case study, and propose a novel shift-limited sorting algorithm for SK-RM. Analytical studies show that the shift-limited sort effectively enhances the time complexity of classical merge sort from O(dn lg n) to O(n lg n), where d is the bit distance between adjacent access ports on the nanotracks of the SK-RM. After that, the efficacy of the proposed shift-limited sort is then verified by experimental studies, where the results are encouraging.
Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS present...
详细信息
Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS presented remarkable time and space results for suffix sorting in external memory, there are still potentials for further improvements. We propose here a new algorithm called nSAIS by reinventing the core inducing procedure in DSAIS with a new set of data structures for running faster and using less space. The suffix array is computed recursively and the inducing procedure on each recursion level is performed block by block to facilitate sequential I/Os. If X has a byte-alphabet and N = (M-2/B), where M and B are the sizes of internal memory and I/O block, respectively, nSAIS guarantees a workspace less than N bytes besides input and output while keeping the linear I/O volume (N) which is the best known so far for external-memory inducing methods. Our experiments on typical settings show that, our program for nSAIS with 40-bit integers not only runs faster than the existing representative external memory algorithms when N keeps growing, but also always uses the least disk space around 6.1 bytes on average. The techniques proposed by this study can be utilized to develop fast and succinct suffix sorters in external memory.
Emerging persistent memory (PM, also termed as non-volatile memory) technologies can promise large capacity, non-volatility, byte-addressability and DRAM-comparable access latency. A host of PM-based storage systems a...
详细信息
Emerging persistent memory (PM, also termed as non-volatile memory) technologies can promise large capacity, non-volatility, byte-addressability and DRAM-comparable access latency. A host of PM-based storage systems and applications that store and access data directly in PM have been inspired by such amazing features. sorting is an important function for many systems, but how to optimize sorting for PM-based systems has not been systematically studied. In this paper, we conduct extensive experiments for many existing sorting methods, including both conventional sorting algorithms adapted for PM and recently-proposed PM-friendly sorting techniques, on a DRAM and real PM hybrid platform. The results indicate that these sorting methods all have drawbacks for various workloads. Some of the results are even counterintuitive compared to running on a DRAM-simulated platform in their papers. To the best of our knowledge, we are the first to perform a systematic study on the sorting issue for persistent memory. Based on our study, we summarize principles on selecting the optimal algorithms and propose an adaptive sorting engine called PMSort to perform the selecting operation and reduce failure recovery overhead in PM. We conduct extensive experiments and show that PMSort can select the best sorting algorithm in a variety of cases.
We propose a new sweeping solution method for the three-dimensional (3D) discrete ordinates (Sn) equations of neutron transport on general hexahedral meshes, with particular focus on handling the surface integrals and...
详细信息
We propose a new sweeping solution method for the three-dimensional (3D) discrete ordinates (Sn) equations of neutron transport on general hexahedral meshes, with particular focus on handling the surface integrals and sweeping deadlocks. The main contributions of this paper include three aspects. Firstly, the surface integrals on the non-planar cell faces are well discretized by virtue of the effective face method, which successfully addresses the reentrance problem without the decomposition of the cell faces. Secondly, based on the effective face method, we devise a new sorting algorithm which works for any hexahedral meshes. In this algorithm, the identification for dependent cycles is avoided and the physical characteristics of transport problems are taken into account in the choice of lagged cells used to decouple the sweeping deadlocks. Combining the above advancements, a splitting sweeping iterative method is proposed for the solution of the Sn equations on general hexahedral meshes. Finally, we prove theoretically that this sweeping iterative method always converges, and the decoupling method affects little on iterative convergence rate. Numerical experiments are presented to demonstrate the effectiveness of the proposed methods on both cubic and spherical domains. The ideas presented in this paper are also applicable to the polyhedral meshes or two-dimensional polygonal meshes. (C) 2022 Elsevier Inc. All rights reserved.
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation o...
详细信息
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas.
In cascaded H-bridge (CHB) converter, despite its high modularity, each h-bridge cell requires an isolated dcsource. The phase-shifting transformer in cascaded h-bridge (CHB) converter is its main drawback. New topolo...
详细信息
ISBN:
(纸本)9781665403665
In cascaded H-bridge (CHB) converter, despite its high modularity, each h-bridge cell requires an isolated dcsource. The phase-shifting transformer in cascaded h-bridge (CHB) converter is its main drawback. New topologies have been introduced with a reduced number of DC sources e.g. MMC. But it needs lots of semiconductor components. A new topology is proposed in this paper which is called cascaded flying cells (CFC). This topology provides high modularity and uses a single DC-link per phase. A sorting algorithm is proposed in this paper by which the voltages of flying capacitors in the converter cells are kept balanced with a constant switching frequency. Compared to several reduced count multilevel converters, it provides high modularity and better controllability of capacitor voltages. The proper operation of a five-level CFC is simulated in both steady-state and dynamic conditions in MATLAB/SIMULINK environment.
The manufacturing of solderless wrapped connections in traceable and reproducible high quality requires an automated manufacturing process. However, one key challenge for robot-based wiring is the definition of wiring...
详细信息
The manufacturing of solderless wrapped connections in traceable and reproducible high quality requires an automated manufacturing process. However, one key challenge for robot-based wiring is the definition of wiring sequence, as the proximity of the array-oriented wrapposts exacerbates tool accessibility. In this research, we optimize the wiring sequence by treating it as a constraint satisfaction problem, whereby a solution is indispensable, even if the constraints are not fulfilled. Experiments on a pilot plant for robot-based manufacturing of wrapped connections show that the percentage of automatically wired connections can be significantly increased with the developed approach.
sorting with real number keys has time complexity n log n. This holds under the assumption that for all n samples a comparison sort is used. Here we propose to use the counting sort with just n cells for initial place...
详细信息
ISBN:
(纸本)9781728166957
sorting with real number keys has time complexity n log n. This holds under the assumption that for all n samples a comparison sort is used. Here we propose to use the counting sort with just n cells for initial placement of samples. We resolve cases of groups of several samples placed into one cell by a comparison sort. Surprisingly, even this part has time complexity proportional to n. Numerical experiments confirm this finding and shows influence of the computing environment such as paging, and reflects a higher speed than the quicksort.
This study investigated the effects of respiratory motion, including unwanted breath holding, on the target volume and centroid position on four-dimensional computed tomography (4DCT) imaging. Cine 4DCT images were re...
详细信息
This study investigated the effects of respiratory motion, including unwanted breath holding, on the target volume and centroid position on four-dimensional computed tomography (4DCT) imaging. Cine 4DCT images were reconstructed based on a time-based sorting algorithm, and helical 4DCT images were reconstructed based on both the time-based sorting algorithm and an amplitude-based sorting algorithm. A spherical object 20 mm in diameter was moved according to several simulated respiratory motions, with a motion period of 4.0 s and maximum amplitude of 5 mm. The object was extracted automatically, and the target volume and centroid position in the craniocaudal direction were measured using a treatment planning system. When the respiratory motion included unwanted breath-holding times shorter than the breathing cycle, the root mean square errors (RSME) between the reference and imaged target volumes were 18.8%, 14.0%, and 5.5% in time-based images in cine mode, time-based images in helical mode, and amplitude-based images in helical mode, respectively. In helical mode, the RSME between the reference and imaged centroid position was reduced from 1.42 to 0.50 mm by changing the reconstruction method from time- to amplitude-based sorting. When the respiratory motion included unwanted breath-holding times equal to the breathing cycle, the RSME between the reference and imaged target volumes were 19.1%, 24.3%, and 15.6% in time-based images in cine mode, time-based images in helical mode, and amplitude-based images in helical mode, respectively. In helical mode, the RSME between the reference and imaged centroid position was reduced from 1.61 to 0.83 mm by changing the reconstruction method from time- to amplitude-based sorting. With respiratory motion including breath holding of shorter duration than the breathing cycle, the accuracies of the target volume and centroid position were improved by amplitude-based sorting, particularly in helical 4DCT.
This paper describes a fast integer sorting algorithm, herein referred to as Bit-index sort, which does not use comparisons and is intended to sort partial permutations. Experimental results exhibit linear complexity ...
详细信息
ISBN:
(纸本)9781467356138;9781467356121
This paper describes a fast integer sorting algorithm, herein referred to as Bit-index sort, which does not use comparisons and is intended to sort partial permutations. Experimental results exhibit linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers, supported by machine hardware, to retrieve the ordered output sequence. Results show that Bit-index sort outperforms quicksort and counting sort algorithms when compared in their execution time. A parallel approach for Bit-index sort using two simultaneous threads is also included, which obtains further speedups of up to 1.6 compared to its sequential case.
暂无评论