Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. parallelprocessing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by ...
详细信息
Computer simulations withthe first-principle (kinetic) model are essential for studying multi-scale processes in space plasma. We develop numerical schemes for Vlasov simulations for practical use on currently-existi...
详细信息
In this paper, we propose an implementation of a parallel two-dimensional fast Fourier transform (FFT) using Intel Advanced Vector Extensions (AVX) instructions on multi-core processors. the combination of vectorizati...
详细信息
In this paper, a new parallel phase algorithm for parallel turbo decoder is proposed. Traditional sliding window turbo algorithm exchanges extrinsic information phase by phase, it will induce long decoding latency. th...
详细信息
Long-running HPC applications guard against node failures by writing checkpoints to parallel file systems. Writing these checkpoints with petascale class machines has proven difficult and the increased concurrency dem...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
Long-running HPC applications guard against node failures by writing checkpoints to parallel file systems. Writing these checkpoints with petascale class machines has proven difficult and the increased concurrency demands of exascale computing will exacerbate this problem. To meet checkpointing demands and sustain application-perceived throughput at exascale, multi-tiered hierarchical storage architectures involving solid-state burst buffers are being considered. In this paper, we describe the design and implementation of cento, a multi-level, content-addressable checkpoint file system for large-scale HPC systems. cento achieves in-flight checkpoint data reduction across all compute nodes through compression and elimination of duplicate blocks over a series of checkpoints. through a detailed analysis of checkpoint dumps, we assess the benefits of data reduction for scientific applications that are representative of production workloads. We observe upto 40% data reduction within a limited sample of representative workloads. Finally, experiments on existing systems show a decrease in checkpoint commit latencies by 5 to 20% reducing the load on the parallel file system.
the model in this paper is abstracted from a key course of batch annealing process in the steel industry: heating or cooling. We mainly investigate scheduling parallel dedicated machines with a single sever as well as...
详细信息
Large-size infrared touch screen has long response time and external light interference problems. To solve these problems, this paper proposes a method that divides whole infrared touch screen into several separate sc...
详细信息
ISBN:
(纸本)9781467317443
Large-size infrared touch screen has long response time and external light interference problems. To solve these problems, this paper proposes a method that divides whole infrared touch screen into several separate scanning units and a master module. Each scanning unit completes touch scanning and touch information processing within its own range. Scanning units are parallel working. Master module gets touch information from each scanning unit through UART bus, processes touch information to get final touch coordinates of the whole screen and reports them to host computer. through parallel working among scanning-units rather than sequential scanning on full-screen, this method improves the response speed. through using photodiode and crystal resonator to constitute received signal processing circuit, this method achieves anti-light interference. After response time tested and anti-light interference tested, it has proved that this method achieves a fast response and anti-light interference infrared touch screen system.
Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwor...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwords, it can require millions of hash function and string comparison operations. these operations can be computationally expensive but are easily parallelizable because each password can be tested independently. therefore, using high performance computing (HPC) can greatly reduce the time required to perform password recovery. Due to the high level of fine-grained parallelism of this type of problem, GPU computing using Compute Unified Device Architecture (CUDA) can be used to further improve performance. the scale of HPC can be further increased through the use of multiple GPUs, but this requires communication between the GPU devices and can reduce the overall performance due to increased communications latency. In this work a well established HPC framework, Message Passing Interface (MPI), was used to minimize the amount of latency and handle the communication between the devices. this allowed for a course-grained division of the problem using MPI where each device applies a fine-grained division of the problem using CUDA to perform the actual calculations. this paper describes three dictionary-based password recovery algorithmsthat use both MPI and CUDA. In this approach the hashed values of known words are computed and compared with hash values of unknown user passwords. the algorithms differed in GPU memory utilization and how the data was divided and distributed among the MPI nodes and GPU devices. A divided dictionary algorithm split the dictionary of potential passwords over the GPUs and copied the password database to each GPU. A divided password database algorithm split the password database and copied the potential passwords. A minimal memory algorithm split the password database and sequentially processed individual passwords on the GPUs. the div
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to t...
详细信息
the proceedings contain 38 papers. the topics discussed include: creativity, cognitive mechanisms, and logic;MicroPsi 2: the next generation of the MicroPsi framework;an extensible language interface for robot manipul...
ISBN:
(纸本)9783642355059
the proceedings contain 38 papers. the topics discussed include: creativity, cognitive mechanisms, and logic;MicroPsi 2: the next generation of the MicroPsi framework;an extensible language interface for robot manipulation;noisy reasoners: errors of judgement in humans and AIs;a representation theorem for decisions about causal models;modular value iteration through regional decomposition;perception processing for general intelligence: bridging the symbolic/subsymbolic gap;on attention mechanisms for AGI architectures: a design proposal;a framework for representing action meaning in artificial systems via force dimensions;avoiding unintended AI behaviors;decision support for safe AI design;on measuring social intelligence: experiments on competition and cooperation;and toward tractable AGI: challenges for system identification in neural circuitry.
暂无评论