Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. parallelprocessing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by ...
详细信息
Progress in the development of the PetaScale implementation of the anelastic EULAG model combined withthe warm-rain bulk and bin microphysics schemes, as well its application to multiscale cloud modeling, are present...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
Progress in the development of the PetaScale implementation of the anelastic EULAG model combined withthe warm-rain bulk and bin microphysics schemes, as well its application to multiscale cloud modeling, are presented. A new three-dimensional (3D) model domain decomposition is implemented to increase model performance and scalability. We investigate performance of the code on IBM BlueGene/L and Cray XT4/XE6 architectures. the scalability results show significant improvement of the new domain decomposition over the previous 2D decomposition used as the standard in many geophysical fluid flow models.
Computer simulations withthe first-principle (kinetic) model are essential for studying multi-scale processes in space plasma. We develop numerical schemes for Vlasov simulations for practical use on currently-existi...
详细信息
We investigate parallelalgorithms for the solution of the shallow-water equation in a space-time framework. For periodic solutions, the discretized problem can be written as a large cyclic non-linear system of equati...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
We investigate parallelalgorithms for the solution of the shallow-water equation in a space-time framework. For periodic solutions, the discretized problem can be written as a large cyclic non-linear system of equations. this system of equations is solved with a Newton iteration which uses two levels of preconditioned GMRES solvers. the parallel performance of this algorithm is illustrated on a number of numerical experiments.
Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwor...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwords, it can require millions of hash function and string comparison operations. these operations can be computationally expensive but are easily parallelizable because each password can be tested independently. therefore, using high performance computing (HPC) can greatly reduce the time required to perform password recovery. Due to the high level of fine-grained parallelism of this type of problem, GPU computing using Compute Unified Device Architecture (CUDA) can be used to further improve performance. the scale of HPC can be further increased through the use of multiple GPUs, but this requires communication between the GPU devices and can reduce the overall performance due to increased communications latency. In this work a well established HPC framework, Message Passing Interface (MPI), was used to minimize the amount of latency and handle the communication between the devices. this allowed for a course-grained division of the problem using MPI where each device applies a fine-grained division of the problem using CUDA to perform the actual calculations. this paper describes three dictionary-based password recovery algorithmsthat use both MPI and CUDA. In this approach the hashed values of known words are computed and compared with hash values of unknown user passwords. the algorithms differed in GPU memory utilization and how the data was divided and distributed among the MPI nodes and GPU devices. A divided dictionary algorithm split the dictionary of potential passwords over the GPUs and copied the password database to each GPU. A divided password database algorithm split the password database and copied the potential passwords. A minimal memory algorithm split the password database and sequentially processed individual passwords on the GPUs. the div
this paper describes the KNOWLEDGESTORE, a large-scale infrastructure for the combined storage and interlinking of multimedia resources and ontological knowledge. Information in the KNOWLEDGESTORE is organized around ...
详细信息
ISBN:
(纸本)9782951740877
this paper describes the KNOWLEDGESTORE, a large-scale infrastructure for the combined storage and interlinking of multimedia resources and ontological knowledge. Information in the KNOWLEDGESTORE is organized around entities, such as persons, organizations and locations. the system allows (i) to import background knowledge about entities, in form of annotated RDF triples;(ii) to associate resources to entities by automatically recognizing, coreferring and linking mentions of named entities;and (iii) to derive new entities based on knowledge extracted from mentions. the KNOWLEDGESTORE builds on state of art technologies for language processing, including document tagging, named entity extraction and cross-document coreference. Its design provides for a tight integration of linguistic and semantic features, and eases the further processing of information by explicitly representing the contexts where knowledge and mentions are valid or relevant. We describe the system and report about the creation of a large-scale KNOWLEDGESTORE instance for storing and integrating multimedia contents and background knowledge relevant to the Italian Trentino region.
Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more m...
详细信息
ISBN:
(纸本)9782951740877
Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more multilingual corpora are becoming available for various natural language processing tasks. this paper questions the use of these multilingual corpora in translation studies and shows the methodological steps needed in order to obtain more reliably comparable sub-corpora that consist of original and directly translated text only. Various experiments are presented that show the advantage of directional sub-corpora.
We propose a fast algorithm which is based on the beamlet decomposition for real-time rendering of scenes in participating media with multiple scattering. Firstly, the light source radiation is considered as composed ...
详细信息
ISBN:
(纸本)9780769548968;9781467347259
We propose a fast algorithm which is based on the beamlet decomposition for real-time rendering of scenes in participating media with multiple scattering. Firstly, the light source radiation is considered as composed by all particles in the media and each particle radiation is decomposed along different forward directions using the plane decomposition method. then the multiple scattering radiation of one particle is calculated by the decomposition radiations from its adjacent particles and the light source. Finally, according to the multiple scattering radiation value of each particle, the radiation of the ray which is from viewpoint is calculated using ray marching method, which can be implemented on the graphics processing unit (GPU), and rendering process is highly parallel. the experimental results show that the algorithm can achieve real-time rendering efficiency and enhance the practicality of multiple scattering.
We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real...
详细信息
Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm [16] to combine several searches for a pattern into a collection of fixed-length wo...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
Text matching with errors is a regular task in computational biology. We present an extension of the bit-parallel Wu-Manber algorithm [16] to combine several searches for a pattern into a collection of fixed-length words. We further present an OpenCL parallelization of a redundant index on massively parallel multicore processors, within a framework of searching for similarities with seed-based heuristics. We successfully implemented and ran our algorithms on GPU and multicore CPU. Some speedups obtained are more than 60x.(1)
暂无评论