Electronic medical records (EMRs) are a rich data source for discovery research but are underutilized due to the difficulty of extracting highly accurate clinical data. We assessed whether a classification algorithm i...
详细信息
Electronic medical records (EMRs) are a rich data source for discovery research but are underutilized due to the difficulty of extracting highly accurate clinical data. We assessed whether a classification algorithm incorporating narrative EMR data (typed physician notes) more accurately classifies subjects with rheumatoid arthritis (RA) compared with an algorithm using codified EMR data alone.
Emerging persistent memory (PM, also termed as non-volatile memory) technologies can promise large capacity, non-volatility, byte-addressability and DRAM-comparable access latency. A host of PM-based storage systems a...
详细信息
Emerging persistent memory (PM, also termed as non-volatile memory) technologies can promise large capacity, non-volatility, byte-addressability and DRAM-comparable access latency. A host of PM-based storage systems and applications that store and access data directly in PM have been inspired by such amazing features. sorting is an important function for many systems, but how to optimize sorting for PM-based systems has not been systematically studied. In this paper, we conduct extensive experiments for many existing sorting methods, including both conventional sorting algorithms adapted for PM and recently-proposed PM-friendly sorting techniques, on a DRAM and real PM hybrid platform. The results indicate that these sorting methods all have drawbacks for various workloads. Some of the results are even counterintuitive compared to running on a DRAM-simulated platform in their papers. To the best of our knowledge, we are the first to perform a systematic study on the sorting issue for persistent memory. Based on our study, we summarize principles on selecting the optimal algorithms and propose an adaptive sorting engine called PMSort to perform the selecting operation and reduce failure recovery overhead in PM. We conduct extensive experiments and show that PMSort can select the best sorting algorithm in a variety of cases.
Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS present...
详细信息
Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS presented remarkable time and space results for suffix sorting in external memory, there are still potentials for further improvements. We propose here a new algorithm called nSAIS by reinventing the core inducing procedure in DSAIS with a new set of data structures for running faster and using less space. The suffix array is computed recursively and the inducing procedure on each recursion level is performed block by block to facilitate sequential I/Os. If X has a byte-alphabet and N = (M-2/B), where M and B are the sizes of internal memory and I/O block, respectively, nSAIS guarantees a workspace less than N bytes besides input and output while keeping the linear I/O volume (N) which is the best known so far for external-memory inducing methods. Our experiments on typical settings show that, our program for nSAIS with 40-bit integers not only runs faster than the existing representative external memory algorithms when N keeps growing, but also always uses the least disk space around 6.1 bytes on average. The techniques proposed by this study can be utilized to develop fast and succinct suffix sorters in external memory.
MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all simil...
详细信息
MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all similarity relationships among genes in multiple genomes. In MBGD, an automated classification algorithm has been implemented so that users can create their own classification table by specifying a set of organisms and parameters. This feature is especially useful when the user's interest is focused on some taxonomically related organisms. The created classification table is stored into the database and can be explored combining with the data of individual genomes as well as similarity relationships among genomes. Using these data, users can carry out comparative analyses from various points of view, such as phylogenetic pattern analysis, gene order comparison and detailed gene structure comparison. MBGD is accessible at http://***/.
Two hybrid methods of distributive sort and quicksort are given. The first method sorts an array of records and the second one sorts a linearly linked list in a stable way. The expected running time of the methods is ...
详细信息
Two hybrid methods of distributive sort and quicksort are given. The first method sorts an array of records and the second one sorts a linearly linked list in a stable way. The expected running time of the methods is O(n) for n records and for a wide class of distributions of the keys (including all bounded densities with a compact support). For most other distributions the running time is O(n log n), and the worst case time is O(n2). The array version needs extra storage space for n records and approximately n/5 integers. In the linked list version only an array of n/5 pointers is needed. The observed running times of the algorithms compare favourably with those of other efficient bucket sort algorithms.
Background: In addition to fatigue, pain is the most frequent persistent symptom in cancer survivors. Clear guidelines for both the diagnosis and treatment of pain in cancer survivors are lacking. Classification of pa...
详细信息
Background: In addition to fatigue, pain is the most frequent persistent symptom in cancer survivors. Clear guidelines for both the diagnosis and treatment of pain in cancer survivors are lacking. Classification of pain is important as it may facilitate more specific targeting of treatment. In this paper we present an overview of nociceptive, neuropathic and central sensitization pain following cancer treatment, as well as the rationale, criteria and process for stratifying pain *** and methods: Recently, a clinical method for classifying any pain as either predominant central sensitization pain, neuropathic or nociceptive pain was developed, based on a large body of research evidence and international expert opinion. We, a team of 15 authors from 13 different centers, four countries and two continents have applied this classification algorithm to the cancer survivor ***: The classification of pain following cancer treatment entails two steps: (1) examining the presence of neuropathic pain;and (2) using an algorithm for differentiating predominant nociceptive and central sensitization pain. Step 1 builds on the established criteria for neuropathic pain diagnosis, while Step 2 applies a recently developed clinical method for classifying any pain as either predominant central sensitization pain, neuropathic or nociceptive pain to the cancer survivor ***: The classification criteria allow identifying central sensitization pain following cancer treatment. The recognition of central sensitization pain in practice is an important development in the integration of pain neuroscience into the clinic, and one that is relevant for people undergoing and following cancer treatment.
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation o...
详细信息
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas.
We introduce a new sorting device for permutations, which we call popqueue. It consists of a special queue, having the property that any time one wants to extract elements from the queue, actually all the elements cur...
详细信息
We introduce a new sorting device for permutations, which we call popqueue. It consists of a special queue, having the property that any time one wants to extract elements from the queue, actually all the elements currently in the queue are poured into the output. We illustrate two distinct optimal algorithms, called Min and Cons, to sort a permutation using such a device, which allow us also to characterize sortable permutations in terms of pattern avoidance. We next investigate what happens by making two passes through a popqueue, showing that the set of sortable permutations is not a class for Min, whereas it is for Cons. In the latter case we also explicitly find the basis of the class of sortable permutations. Finally, we study preimages under Cons (by means of an equivalent version of the algorithm), and find a characterization of the set of preimages of a given permutation. We also give some enumerative results concerning the number of permutations having k preimages, for k = 1, 2, 3, and we conclude by observing that there exist permutations having k preimages for any value of k >= 0.
A fully digital parallel Hopfield machine is described. Its speed has been achieved at the cost of representing the weights in the unary form (i.e. the number of digits is the value and there is only one type of digit...
详细信息
A fully digital parallel Hopfield machine is described. Its speed has been achieved at the cost of representing the weights in the unary form (i.e. the number of digits is the value and there is only one type of digit). The inner product 'weight by neuron state' ('integer by binary') is realized by a special circuit which provides addition of all terms in parallel. It is based on the parallel binary sorting algorithm implemented by a combination circuit composed of checkerboard-pattern regular blocks. The thresholding is performed by sensing the central bit of the unary-represented value of the inner product. As all hardware is composed of combination circuits, the only synchronized process is the Hopfield iteration step. The machine includes learning: it contains a storage of all input patterns, and the corresponding weights are computed at a time using a hard-wired Hebbian formula.
This article presents an O(n)-time algorithm called SACA-K for sorting the suffixes of an input string T[0, n-1] over an alphabet A[0, K-1]. The problem of sorting the suffixes of T is also known as constructing the s...
详细信息
This article presents an O(n)-time algorithm called SACA-K for sorting the suffixes of an input string T[0, n-1] over an alphabet A[0, K-1]. The problem of sorting the suffixes of T is also known as constructing the suffix array (SA) for T. The theoretical memory usage of SACA-K is n log K+n log n+K log n bits. Moreover, we also have a practical implementation for SACA-K that uses n bytes +(n + 256) words and is suitable for strings over any alphabet up to full ASCII, where a word is log n bits. In our experiment, SACA-K outperforms SA-IS that was previously the most time-and space-efficient linear-time SA construction algorithm (SACA). SACA-K is around 33% faster and uses a smaller deterministic workspace of K words, where the workspace is the space needed beyond the input string and the output SA. Given K = O(1), SACA-K runs in linear time and O(1) workspace. To the best of our knowledge, such a result is the first reported in the literature with a practical source code publicly available.
暂无评论