Finding the genome of new species remains as one of the most crucial tasks in molecular biology. To achieve that end, de novo sequence assembly feeds from the vast amount of data provided by Next-Generation Sequenci...
详细信息
Finding the genome of new species remains as one of the most crucial tasks in molecular biology. To achieve that end, de novo sequence assembly feeds from the vast amount of data provided by Next-Generation Sequencing technology. Therefore, genome assemblers demand a high amount of computational resources, and parallel implementations of those assemblers are readily available. This paper presents a comparison of three well-known de novo genome assemblers: Velvet, ABySS and SOAPdenovo, all of them using de Bruijn graphs and having a parallel implementation. We based our analysis on parallel execution time, scalability, quality of assembly, and sensitivity to the choice of a critical parameter ( k -mer size). We found one of the tools clearly stands out for providing faster execution time and better quality in the output. Also, all assemblers are mildly sensitive to the choice of k-mer size and they all show limited scalability. We expect the findings of this paper provide a guide to the development of new algorithms and tools for scalable parallel genome sequence assemblers.
Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory. Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM often pairs with DRAM to build a heterogene...
详细信息
ISBN:
(纸本)9781538683859
Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory. Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM often pairs with DRAM to build a heterogeneous main memory system (HMS). Deciding data placement on NVM-based HMS is critical to enable future NVM-based HPC. In this paper, we study task-parallel programs, and introduce a runtime system to address the data placement problem on NVM-based HMS. Leveraging semantics and execution mode of task-parallel programs, we efficiently characterize memory access patterns of tasks and reduce data movement overhead. We also introduce a performance model to predict performance for tasks with various data placements on HMS. Evaluating with a set of HPC benchmarks, we show that our runtime system achieves higher performance than a conventional HMS-oblivious runtime (24% improvement on average) and two state-of-the-art HMS-aware solutions (16% and 11% improvement on average, respectively).
parallel programming with tasks - task parallel programming - is a promising approach to simplifying multithreaded programming in the chip multiprocessor (CMP) era. Tasks are used to describe independent units of work...
详细信息
parallel programming with tasks - task parallel programming - is a promising approach to simplifying multithreaded programming in the chip multiprocessor (CMP) era. Tasks are used to describe independent units of work that can be assigned to threads at runtime in a way that is transparent to the programmer. Thus, the programmer can concentrate on identifying tasks and leave it to the runtime system to take advantage of the potential parallelism. Supporting the task abstraction on heterogeneous CMPs is more challenging than on conventional CMPs. In this article, we take a look at a lightweight task model and its implementation on the Cell processor, the most prominent heterogeneous CMP available today. Choosing a simple task model over a more complex one makes it possible to target fine-grained parallelism and still improve much in terms of programmability.
A performance vs. practicality trade-off exists between user-level threading techniques. The community has settled mostly on a black-and-white perspective; fully fledged threads assume that suspension is imminent and ...
详细信息
ISBN:
(纸本)9781538683859
A performance vs. practicality trade-off exists between user-level threading techniques. The community has settled mostly on a black-and-white perspective; fully fledged threads assume that suspension is imminent and incur overheads when suspension does not take place, and run-to-completion threads are more lightweight but less practical since they cannot suspend. Gray areas exist, however, whereby threads can start with minimal capabilities and then can be dynamically promoted to acquire additional capabilities when needed. This paper investigates the full spectrum of threading techniques from a performance vs. practicality trade-off perspective on modern multicore and many-core systems. Our results indicate that achieving the best trade-off highly depends on the suspension likelihood; dynamic promotion is more appropriate when suspension is unlikely and represents a solid replacement for run to completion, thanks to its lower programming constraints, while fully fledged threads remain the technique of choice when suspension likelihood is high.
Due to the end of the Moore's law in clocking and Dennard's scaling, we are reaching very crippling limits with our current von Neumann processor paradigms. All the help is sought from both technology and arch...
详细信息
Due to the end of the Moore's law in clocking and Dennard's scaling, we are reaching very crippling limits with our current von Neumann processor paradigms. All the help is sought from both technology and architectures to innovate and engender new processing paradigms that can overcome those limitations and define the future of computing. New ideas and directions ranged from neuromorphic processors, to analog, mersisters, quantum and the use of nano photonics. This talk will examine a number of these emerging directions and work by the community including ours and evaluate some of the associated implications for the future of computing.
Provides an abstract of the keynote presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.
ISBN:
(纸本)9781538655566;9781538655559
Provides an abstract of the keynote presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
ISBN:
(纸本)9781538655566;9781538655559
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separatio...
详细信息
ISBN:
(纸本)9781450344937
Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separation of problem decomposition and parallelism requires a sufficiently large input problem to achieve satisfactory efficiency on a given number of cores. Unfortunately, finding a good match between input size and core count usually requires significant experimentation, which is expensive and sometimes even impractical. In this paper, we propose an automated empirical method for finding the isoefficiency function of a task based program, binding efficiency, core count, and the input size in one analytical expression. This allows the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we not only find (i) the actual isoefficiency function but also (ii) the function one would yield if the program execution was free of resource contention and (iii) an upper bound that could only be reached if the program was able to maintain its average parallelism throughout its execution. The difference between the three helps to explain low efficiency, and in particular, it helps to differentiate between resource contention and structural conflicts related to task dependencies or scheduling. The insights gained can be used to co-design programs and shared system resources.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
ISBN:
(纸本)9781538655566;9781538655559
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
DEPSO-Scout is a hybrid optimization algorithm combining Differential Evolution (DE), Particle Swarm Optimization (PSO) and Artificial Bee Colony (ABC). The solution convergence is balanced between exploration of PSO ...
详细信息
ISBN:
(纸本)9781538623183;9781538623176
DEPSO-Scout is a hybrid optimization algorithm combining Differential Evolution (DE), Particle Swarm Optimization (PSO) and Artificial Bee Colony (ABC). The solution convergence is balanced between exploration of PSO and exploitation from DE. The suboptimal solution has reduced by the scout bee property of ABC. DEPSO-Scout outperforms traditional DE, PSO, and ABC. However, in a higher dimension of search space, the accuracy of DEPSO-Scout is maintained while the search speed is significantly decreased. From the experiment, the computational time varies depending on the complexity of the problem. To improve the time-performance of DEPSO-Scout, the parallelization techniques becomes an interest. By modifying the DEPSO-Scout algorithm with the parallel approach, the speed of algorithm significantly improved while the correctness of solutions is maintained. The experiment and analysis of speedup and algorithm efficiency are discussed. The improvement opportunity of parallel DEPSO-Scout is also discussed in the last section.
暂无评论