Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even thou...
详细信息
ISBN:
(纸本)9781728116440
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even though very efficient algorithms have been defined to solve such a subgraph isomorphisms problem, the complexity of current real biological networks make their sequential execution time prohibitive. On the other hand, parallelarchitectures, from multi-core to many-core, have become pervasive to deal withthe problem of the data size. Nevertheless, the sequential nature of the graph searching algorithms makes their implementation for parallelarchitectures very challenging. this paper presents three different parallel solutions for the graph searching problem. the first two target the exact search for multi-core CPUs and many-core GPUs, respectively. the third one targets the approximate search for GPUs, which handles node, edge, and node label mismatches. the paper shows how different techniques have been developed in all the solutions to reduce the search space complexity. the paper shows the performance of the proposed solutions on representative biological networks containing antiviral chemical compounds and protein interactions networks.
Withthe development of satellite load and very large scale integrated (VLSI) circuit technology, spaceborne real-time synthetic aperture radar (SAR) imaging systems have become a solution for rapid response to hazard...
详细信息
ISBN:
(纸本)9789811365089;9789811365072
Withthe development of satellite load and very large scale integrated (VLSI) circuit technology, spaceborne real-time synthetic aperture radar (SAR) imaging systems have become a solution for rapid response to hazards. through analyzing the algorithm pipeline flow as well as introducing the storage-computation model, a balanced and high-efficiency 2-D data access technology based on cross-mapping data storage method has been achieved to suit the large point processing for real-time spaceborne SAR system. A proto-type based on NetFPGA-SUME board with Xilinx XC7VX690T is given to verify the performance of the proposed design. Taking Stripmap SAR imaging of 16384 * 16384 granularity raw data (5 m resolution, 25 km width) as an example, the imaging based on chirp scaling algorithm takes 6.63 s, which is better than some other real-time processing methods.
this paper proposes a parameter filtering circular update algorithm, which provides an efficient solution for parallel training of deep neural networks as the scale of deep neural networks continues to expand. the par...
详细信息
this paper proposes a parameter filtering circular update algorithm, which provides an efficient solution for parallel training of deep neural networks as the scale of deep neural networks continues to expand. the parameter filtering circular update algorithm filters a large number of redundant unimportant parameters in the deep neural network training process by setting the importance threshold, and in an iteration process, the parameters of each node are synchronized through the circular update algorithm. the parameter filtering circular update algorithm compresses the training parameters of the DNN, and greatly reduces the overhead caused by the communication between each node and the parameter server in the synchronous parallel training. Experiments show that the parameter filtering circular update algorithm can improve the parallel training speed of the DNNs without losing the prediction accuracy, and the GPU utilization rate on each computing node is higher than that of the general synchronous training method.
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational pro...
详细信息
Learning algorithms are increasingly being applied to behavioral decision systems for unmanned vehicles. In multi-source road environments, it is one of the key technologies to solve the decision-making problem of dri...
详细信息
ISBN:
(纸本)9783030638290;9783030638306
Learning algorithms are increasingly being applied to behavioral decision systems for unmanned vehicles. In multi-source road environments, it is one of the key technologies to solve the decision-making problem of driverless vehicles. this paper proposes a parallel network, called DF-PLSTM-FCN, which is composed of LSTM-FCN-variant and LSTM-FCN. As an end-to-end model, it will jointly learn a mapping from the visual state and previous driving data of the vehicle to the specific behavior. Different from LSTM-FCN, LSTM-FCN-variant provides more discernible features for the current vehicle by introducing dual feature fusions. Furthermore, decision fusion is adopted to fuse the decisions made by LSTM-FCN-variant and LSTM-FCN. the parallel network structure with dual fusion on both features and decisions can take advantage of the two different networks to improve the prediction for the decision, without the significant increase in computation. Compared with other deep-learning-based models, our experiment presents competitive results on the large-scale driving dataset BDDV.
the appearance analysis and counting of peripheral blood leukocytes can assist the diagnosis of blood diseases such as leukemia. therefore, it is necessary to automatically extract leukocytes from blood smear images. ...
详细信息
the Virtual Time Reversal algorithm has the advantages of low computational complexity and high accuracy in low signal-to-noise ratio, so it is easy to implement in hardware engineering. the parallel design of FPGA ca...
详细信息
the Virtual Time Reversal algorithm has the advantages of low computational complexity and high accuracy in low signal-to-noise ratio, so it is easy to implement in hardware engineering. the parallel design of FPGA can greatly improve the speed of direction finding system and satisfy the real-time direction finding. the paper adopts Virtex-7 FPGA of Xilinx Company with Verilog-HDL to program the design of Passive DOA Estimation by Virtual Time Reversal (PVTR-DOA). Firstly, according to the principle of Virtual Time Reversal algorithm, a parallelprocessing scheme is proposed. this design establishes the algorithm module through Vivado software platform, and uses package IP tool to package the algorithm module into a custom IP core. then, the data to be tested in DDR3 and the custom IP core are moved through DMA data interaction mode for signal processing. the FPGA design uses MicroBlaze core to implant C code on SDK platform to control the embedded system. Finally, the effectiveness and real-time direction finding of the hardware design are verified by the experimental results.
Auto vectorization techniques have been adopted by compilers to exploit data-level parallelism in parallelprocessing for decades. However, since processor architectures have kept enhancing with new features to improv...
详细信息
ISBN:
(纸本)9781450362955
Auto vectorization techniques have been adopted by compilers to exploit data-level parallelism in parallelprocessing for decades. However, since processor architectures have kept enhancing with new features to improve vector/SIMD performance, legacy application binaries failed to fully exploit new vector/SIMD capabilities in modern architectures. For example, legacy ARMv7 binaries cannot benefit from ARMv8 SIMD double precision capability, and legacy x86 binaries cannot enjoy the power of AVX-512 extensions. In this paper, we study the fundamental issues involved in cross-ISA Dynamic Binary Translation (DBT) to convert non-vectorized loops to vector/SIMD forms to achieve greater computation throughput available in newer processor architectures. the key idea is to recover critical loop information from those application binaries in order to carry out vectorization at runtime. Experiment results show that our approach achieves an average speedup of 1.42x compared to ARMv7 native run across various benchmarks in an ARMv7-to-ARMv8 dynamic binary translation system.
Hoeffding tree algorithm is a popular online decision tree algorithm capable of learning from huge data streams. the algorithm involves complex time consuming computations in the leaves of the tree for each data insta...
详细信息
Hoeffding tree algorithm is a popular online decision tree algorithm capable of learning from huge data streams. the algorithm involves complex time consuming computations in the leaves of the tree for each data instance. these computations involve a lot of parallelisms which can be exploited and implemented in a field programmable gate array to achieve speedup. this paper presents a hardware accelerator for Hoeffding tree algorithm with adaptive naive bayes predictor in the leaves. the proposed system is capable of accelerating data streams with both nominal and numeric attributes using minimum hardware resources for huge datasets. It is implemented on a Xilinx VC707 board based on Virtex-7 XC7VX485T field programmable gate array. the implemented system is about 9x faster than StreamDm(C++), a well known reference software implementation for the standard forest cover type dataset.
In this paper, we investigate the performance of parallel Discrete Event Simulation ( PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) al...
详细信息
ISBN:
(纸本)9781450362955
In this paper, we investigate the performance of parallel Discrete Event Simulation ( PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) algorithms in this environment and contribute three significant results. First, we show that it is essential to isolate the thread performing MPI communications from the task of processing simulation events, otherwise the simulation is significantly imbalanced and performs poorly. this applies to both synchronous and asynchronous GVT algorithms. Second, we demonstrate that synchronous GVT algorithm based on barrier synchronization is a better choice for communication-dominated models, while asynchronous GVT based on Mattern's algorithm performs better for computation-dominated scenarios. third, we propose Controlled Asynchronous GVT (CA-GVT) algorithm that selectively adds synchronization to Mattern-style GVT based on simulation conditions. We demonstrate that CA-GVT outperforms both barrier and Mattern's GVT and achieves about 8% performance improvement on mixed computation-communication models. this is a reasonable improvement for a simple modification to a GVT algorithm.
暂无评论