Matrices resulting from standard boundary element methods are dense and computationally expensive. To speed up the computational time, the matrix computation is done on a GPU. the parallelprocessing capability of the...
详细信息
Due to the rapid development of the technology, next-generation sequencers can produce huge amount of short DNA fragments covering a genomic sequence of an organism in short time. there is a need for the time-efficien...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
Due to the rapid development of the technology, next-generation sequencers can produce huge amount of short DNA fragments covering a genomic sequence of an organism in short time. there is a need for the time-efficient algorithms which could assembly these fragments together and reconstruct the examined DNA sequence. Previously proposed algorithm for de novo assembly, SR-ASM, produced results of high quality, but required a lot of time for computations. the proposed hybrid parallel programming strategy allows one to use the two-level hierarchy: computations in threads (on a single node with many cores) and computations on different nodes in a cluster. the tests carried out on real data of Prochloroccocus marinas coming from Roche sequencer showed, that the algorithm was speeded up 20 times in comparison to the sequential approach withthe maintenance of the high accuracy and beating results of other algorithms.
Cytogenetic biodosimetry is the definitive test for assessing exposure to ionizing radiation. It involves manual assessment of the frequency of dicentric chromosomes (DCs) on a microscope slide, which potentially cont...
详细信息
ISBN:
(纸本)9781467319751;9781467319768
Cytogenetic biodosimetry is the definitive test for assessing exposure to ionizing radiation. It involves manual assessment of the frequency of dicentric chromosomes (DCs) on a microscope slide, which potentially contains hundreds of metaphase cells. We developed an algorithm that can automatically and accurately locate centromeres in DAPI-stained metaphase chromosomes and that will detect DCs. In this algorithm, a set of 200-250 metaphase cell images are ranked and sorted. the 50 top-ranked images are used in the triage DC assay (DCA). To meet the requirement of DCA in a mass casualty event, we are accelerating our algorithm through parallelization. In this paper, we present our finding in accelerating our ranking and segmentation algorithms. Using data parallelization on a desktop system, the ranking module was up to 4-fold faster than the serial version and the Gradient Vector Flow module (GVF) used in our segmentation algorithm was up to 8-fold faster. Large scale data parallelization of the ranking module processed 18,694 samples in 11.40 hr. Task parallelization of Image ranking withparallelized labeling on a desktop computer reduced processing time by 20% of a serial process, and GVF module recoded withparallelized matrix inversion reduced time by 70%. Overall, we estimate that the automated DCA will require around 1 min per sample on a 64-core computing system. Our long-term goal is to implement these algorithms on a high performance computer cluster to assess radiation exposures for thousands of individuals in a few hours.
this paper presents a precision oriented example based approach for word sense disambiguation (WSD) for a reading assistant system for Japanese learners. Our WSD classifier chooses a sense associated withthe most sim...
详细信息
this paper presents a portable optimization for MPI communications, called PRAcTICaL-MPI (Portable Adaptive Compression Library- MPI). PRAcTICaL-MPI reduces the data volume exchanged among processes by using lossless ...
详细信息
ISBN:
(纸本)9783642328206
this paper presents a portable optimization for MPI communications, called PRAcTICaL-MPI (Portable Adaptive Compression Library- MPI). PRAcTICaL-MPI reduces the data volume exchanged among processes by using lossless compression and offers two main advantages. Firstly, it is independent of the MPI implementation and the application used. Secondly, it allows for turning the compression on and off and selecting the most appropriate compression algorithm at runtime, depending on the characteristics of each message and on network performance. We have validated PRAcTICaL-MPI in different MPI implementations and HPC clusters. the evaluation shows that compressing MPI messages withthe best algorithm and only when it is worthwhile, we obtain a great reduction in the overall execution time for many of the scenarios considered.
Coupled heat and moisture transport in extremely heterogeneous materials like a masonry still cannot be solved for large strucutres. Multi-scale methods withthe macro and meso-scale levels are usually used. the bigge...
详细信息
Finite element analysis for stress and deformation prediction has become routine in many industries. However, the analysis of complex three-dimensional geometries composed of millions of degrees of freedom is beyond t...
详细信息
Computer systems with discrete GPUs are expected to become the standard methodology for high-speed encryption processing, but they require large amounts of power consumption and are inapplicable to embedded devices. T...
详细信息
In this paper we consider large scale distributed committee machines where no local data exchange is possible between neural network modules. Regularization neural networks are used for boththe modules as well as the...
详细信息
As the need of high quality random number generators is constantly increasing especially for cryptographic algorithms, the development of high throughput randomness generators has to be combined withthe development o...
详细信息
As the need of high quality random number generators is constantly increasing especially for cryptographic algorithms, the development of high throughput randomness generators has to be combined withthe development of high performance statistical test suites. Unfortunately the implementations of the most popular batteries of test suites are not focused on efficiency and high performance, do not benefit of the processing power offered by today's multi-core processors and tend to become bottlenecks in the processing of large volumes of data generated by various random number generators. Hence there is a stringent need for providing highly efficient statistical tests and our research efforts and results on improving and parallelizing the TestU01 test suite intend to fill this need. Experimental results show that the parallel version of TestU01 takes full advantage of the system's available processing power, reducing the execution time up to 4 times on the tested multicore systems.
暂无评论