A sequence of real numbers is called twisted if it can be produced from the sorted sequence by repeatedly reversing the order of consecutive subsequences. It is shown that twisted sequences constitute a class of expon...
详细信息
A sequence of real numbers is called twisted if it can be produced from the sorted sequence by repeatedly reversing the order of consecutive subsequences. It is shown that twisted sequences constitute a class of exponentially many members each of which can be recognized and sorted, by a simple on-line algorithm, in linear time.
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks su...
详细信息
ISBN:
(纸本)9781467379526
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time. The implementation on ePUMA utilizes the architecture's specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting. Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm. Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.
Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for t...
详细信息
ISBN:
(数字)9783319238265
ISBN:
(纸本)9783319238265;9783319238258
Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for these algorithms achieve remarkable performance in terms of I/O complexity and speed, their designs are quite complex and their disk requirements remain rather heavy. Currently, the core algorithmic part of each of these programs consists of thousands of lines in C++, and the average peak disk requirement is over 20n bytes for an input string of size n < 2(40). We re-investigate the problem of induced sorting suffixes in external memory and propose a new algorithm SAIS-PQ (SAIS with Priority Queue) and its enhanced alternative SAIS-PQ+. Using the library STXXL, the core algorithmic parts of SAIS-PQ and SAIS-PQ+ are coded in around 800 and 1600 lines in C++, respectively. The time and space performance of these two programs are evaluated in comparison with eSAIS that is also implemented using STXXL. In our experiment, eSAIS runs the fastest for the input strings not larger than 16 GiB, but it is slower than SAIS-PQ+ for the only two input strings of 32 and 48.44 GiB. For the average peak disk requirements, eSAIS and SAIS-PQ+ are around 23n and 15n bytes, respectively.
Among numerous tasks that need to be solved, sorting is considered to be one of the most important. Since it is time consuming for large volumes of data, acceleration is greatly required for many practical application...
详细信息
ISBN:
(纸本)9781479920952;9781479920969
Among numerous tasks that need to be solved, sorting is considered to be one of the most important. Since it is time consuming for large volumes of data, acceleration is greatly required for many practical applications. It is also important to discover such methods that take advantage of the implementation platform (due to its uniqueness) and consider not only the number of the required operations, but also efficiency of their implementation in hardware circuits. This paper evaluates implementations of address-based data sorting algorithms in field-programmable gate array (FPGA) circuits. It is demonstrated that the proposed technique can be used efficiently both in low cost FPGAs as well as in advanced FPGAs, and the number of sorted items (with sizes of up to 32 bits) can reach 2(32).
作者:
Long, ZouZhang, ZhenrongGuangxi Univ
Sch Comp Elect & Informat Nanning 530004 Peoples R China Guangxi Univ
Guangxi Key Lab Multimedia Commun & Network Techn Cultivating Base Nanning 530004 Peoples R China Guangxi Univ
Guangxi Coll & Univ Key Lab Multimedia Commun & I Nanning 530004 Peoples R China
In recent years there has been a growing interest in Internet of Thing, Big Data and Mobile Internet. With the rapid growth of the amount of data in the embedded environment, using a traditional embedded processor is ...
详细信息
ISBN:
(纸本)9781538612309
In recent years there has been a growing interest in Internet of Thing, Big Data and Mobile Internet. With the rapid growth of the amount of data in the embedded environment, using a traditional embedded processor is hard to satisfy the requirements of big data processing. sorting is one of the fundamental operation in data processing and is also frequently used for search, filter, feature analysis and so on. It can contribute significantly to the overall execution time in a system. Existing techniques accelerate sorting using multiprocessor or GPUs, but it is impractical for embedded systems. In this paper, we propose a FPGA-based collaborative hardware sorting unit for embedded data processing. The waiting data is transferred from the embedded processor to our hardware sorting unit by the data bus, then a ordered sequence is given form hardware sorting unit and is returned to the embedded processor by data bus. The waiting data is processed in a pure hardware circuit, do not need anything software operation, so the calculating speed depends on the delay time of hardware circuit, which is faster than traditional iterative algorithm by software method. By the collaborative hardware unit, we can greatly reduce the processor load and improve the operation efficiency significantly. In addition, we can define the size of the data bit flexibly, we can also expend the scale of the unit by circuit topology.
Modular multilevel converters (MMCs) are made up of many submodules (SMs) connected in series. In order to avoid semiconductor over-voltages, SM capacitors should be kept within strict voltage limits. The capacitor ba...
详细信息
ISBN:
(纸本)9781509064267;9781509064250
Modular multilevel converters (MMCs) are made up of many submodules (SMs) connected in series. In order to avoid semiconductor over-voltages, SM capacitors should be kept within strict voltage limits. The capacitor balancing controller (CBC) is used to sort SM capacitor voltages prior to modulation. This paper investigates the variation in sorting complexity at steady state and transient conditions for a brute-force and past-position methodology based on the bubble sort algorithm. A focus is placed on the potential for 'worst-case' sorting, requiring increased computational effort. This paper forms part of ongoing research to analyze the hardware constraints on MMCs with very large numbers of SMs. Simulation results based on a PSCAD/EMTDC detailed equivalent model are used for analysis and conclusions.
sorting with real number keys has time complexity n log n. This holds under the assumption that for all n samples a comparison sort is used. Here we propose to use the counting sort with just n cells for initial place...
详细信息
ISBN:
(纸本)9781728166957
sorting with real number keys has time complexity n log n. This holds under the assumption that for all n samples a comparison sort is used. Here we propose to use the counting sort with just n cells for initial placement of samples. We resolve cases of groups of several samples placed into one cell by a comparison sort. Surprisingly, even this part has time complexity proportional to n. Numerical experiments confirm this finding and shows influence of the computing environment such as paging, and reflects a higher speed than the quicksort.
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation o...
详细信息
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas. (C) 2021 The Authors. Published by Elsevier B.V.
sorting is one of the most important computational tasks in data processing applications. Recent studies show that the FPGA-based hardware accelerators are more efficient than the general-purpose processors and GPUs. ...
详细信息
ISBN:
(纸本)9781728127880
sorting is one of the most important computational tasks in data processing applications. Recent studies show that the FPGA-based hardware accelerators are more efficient than the general-purpose processors and GPUs. By increasing the input records in the sorting network, the number of Compare-And-Swap (CAS) units would be increased, which in turn, will lead to increased resource consumption. In some applications, the number of available resources is limited. Thereby, it is necessary to optimize resource requirements while maintaining a sufficient level of performance. This paper presents a new sorting architecture that reduces the number of required resources compared to the state-of-the-art sorting architecture and achieves the desired performance using Unary processing. Results indicate that the proposed architecture increases throughput by 29.1% and reduces the number of LUTs by 42%, for sorting 8-input records, compared to other architecture.
We propose a new lossless compression method for medical images, based on a hierarchical sorting. Hierarchical sorting is a technique that achieves a high compression ratio by detecting the regions where image pattern...
详细信息
ISBN:
(纸本)0780372115
We propose a new lossless compression method for medical images, based on a hierarchical sorting. Hierarchical sorting is a technique that achieves a high compression ratio by detecting the regions where image patterns change abruptly, and by sorting pixel order by value to increase predictability. This method enables control of sorting accuracy along with size and complexity. As a result, we can reduce the sizes of the permutation tables and reuse the tables for other image regions. Comparison of this method through experiment reveals better performance for medical images generated by X-ray CT, MRI and large size CR . DR instruments. This technique applies a quad-tree division method to divide an image into blocks in order to support progressive decoding and fast preview of large images.
暂无评论