检索结果-内蒙古大学图书馆

15th international conference on algorithms and architectures for parallel processing, ICA3PP 2015

作者： Li, Zhijia Jiao, Li Hu, Xiang State Key Laboratory of Computer Science Institute of Software Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

ISBN: (纸本)9783319271392

Distributed computing technology has been widely used to solve complex problems appearing in parallel processing systems. Job scheduling is very important in many distributed computing systems, like grid systems and high performance computers. their performance is directly related to the efficiency of the distributed computing systems. Modeling them and analyzing their performance can provide quantitative performance metrics and predictions, which are helpful to guide capacity planning and scheduling optimization. In this paper, we study job scheduling systems widespread in high performance computing systems and propose a coloured Petri net method for analyzing their performance, which can be easily implemented in CPN software by potential users. We also propose an approximative modeling technique so as to reduce the model size. As a model-based performance analysis method, our method is low cost and highly flexible. Experimental results show that our method is feasible and can be applied to more complex and large-scale systems. © Springer international Publishing Switzerland 2015.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallelization of enumerating tree-like chemical compounds by breadth-first search order

引用

BMC MEDICAL GENOMICS 2015年第2期8卷 1-7页

作者： Hayashida, Morihiro Jindalertudomdee, Jira Zhao, Yang Akutsu, Tatsuya Kyoto Univ Inst Chem Res Bioinformat Ctr Uji Kyoto 6110011 Japan

Enumeration of chemical compounds greatly assists designing and finding new drugs, and determining chemical structures from mass spectrometry. In our previous study, we developed efficient algorithms, BfsSimEnum and BfsMulEnum for enumerating tree-like chemical compounds without and with multiple bonds, respectively. For many instances, our previously proposed algorithms were able to enumerate chemical structures faster than other existing methods. Latest processors consist of multiple processing cores, and are able to execute many tasks at the same time. In this paper, we develop three parallelized algorithms BfsEnumP1-3 by modifying BfsSimEnum in simple manners to further reduce execution time. BfsSimEnum constructs a family tree in which each vertex denotes a molecular tree. BfsEnumP1-3 divide a set of vertices with some given depth of the family tree into several subsets, each of which is assigned to each processor. For evaluation, we perform experiments for several instances with varying the division depth and the number of processors, and show that BfsEnumP1-3 are useful to reduce the execution time for enumeration of tree-like chemical compounds. In addition, we show that BfsEnumP3 achieves more than 80% parallelization efficiency using up to 11 processors, and reduce the execution time using 12 processors to about 1/10 of that by BfsSimEnum.

关键词： Execution Time Normal Form parallel Algorithm Assignment Method Cache Memory

来源：评论

学校读者我要写书评

暂无评论

Dynamic message processing and transactional memory in the actor model 15

引用

15th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems, DAIS 2015 Held as Part of the 10th international Federated conference on Distributed Computing Techniques, DisCoTec 2015

作者： Hayduk, Yaroslav Sobe, Anita Felber, Pascal University of Neuchatel Switzerland

ISBN: (纸本)9783319191287

With the trend of ever growing data centers and scaling core counts, simple programming models for efficient distributed and concurrent programming are required. One of the successful principles for scalable computing is the actor model, which is based on message passing. Actors are objects that hold local state that can only be modified by the exchange of messages. To avoid typical concurrency hazards, each actor processes messages sequentially. However, this limits the scalability of the model. We have shown in former work that concurrent message processing can be implemented with the help of transactional memory, ensuring sequential processing, when required. this approach is advantageous in low contention phases, however, does not scale for high contention phases. In this paper we introduce a combination of dynamic resource allocation and non-transactional message processing to overcome this limitation. this allows for efficient resource utilization as these two mechanisms can be handled in parallel. We show that we can substantially reduce the execution time of high-contention workloads in a micro-benchmark as well as in a real-world application. © IFIP international Federation for Information processing 2015.

关键词： Message passing

来源：评论

学校读者我要写书评

暂无评论

Optimal Performance Prediction of ADAS algorithms on Embedded parallel architectures

Optimal Performance Prediction of ADAS Algorithms on Embedde...

引用

IEEE international conference on High Performance Computing and Communications (HPCC)

作者： Romain Saussard Boubker Bouzid Marius Vasiliu Roger Reynaud Renault S.A.S Guyancourt France Instutut d'Electronique Fondamentale Université Paris Sud Orsay France

ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. these SoCs (System on Chip) are composed of different processing units, with different capabilities, and often with massively parallel computing unit. Due to the complexity of these SoCs, predicting if a given algorithm can be executed in real time on a given architecture is not trivial. In fact it is not a simple task for automotive industry actors to choose the most suited heterogeneous SoC for a given application. Moreover, embedding complex algorithms on these systems remains a difficult task due to heterogeneity, it is not easy to decide how to allocate parts of a given algorithm on the different computing units of a given SoC. In order to help automotive industry in embedding algorithms on heterogeneous architectures, we propose a novel approach to predict performances of image processing algorithms applicable on different types of computing units. Our methodology is able to predict a more or less wide interval of execution time with a degree of confidence using only high level description of algorithms, and a few characteristics of computing units.

关键词： Kernel Computer architecture Image processing Graphics processing units Prediction algorithms parallel processing Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Task-Based parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2 10th

Task-Based Parallel Sparse Matrix-Vector Multiplication (SpM...

引用

10th international conference on Large-Scale Scientific Computations (LSSC)

作者： Stoyanov, Dimitar Machado, Rui Pfreundt, Franz-Josef Fraunhofer ITWM Kaiserslautern Germany

ISBN: (纸本)9783319265209;9783319265193

We present a task-based implementation of SpMVM with the PGAS communication library GPI-2. this computational kernel is essential for the overall performance of the Krylov subspace solvers but its proper hybrid parallel design is nowadays still a challenge on hierarchical architectures consisting of multi-and many-core sockets and nodes. the GPI-2 library allows, by default and in a natural way, a task-based parallelization. thus, our implementation is fully asynchronous and it considerably differs from the standard hybrid approaches combining MPI and threads/OpenMP. Here we briefly describe the GPI-2 library, our implementation of the SpMVM routine, and then we compare the performance of our Jacobi preconditioned Richardson solver against the PETSc-Richardson using Poisson BVP in a unit cube as a benchmark test. the comparison employs two types of domain decomposition and demonstrates the preemptive performance and better scalability of our task-based implementation.

关键词： GASPI GPI-2 PGAS Task-based hybrid parallelization Sparse matrix-vector multiplication Krylov subspace solvers Performance

来源：评论

学校读者我要写书评

暂无评论

parallel column subset selection of kernel matrix for scaling up support vector machines 15th

Parallel column subset selection of kernel matrix for scalin...

引用

15th international conference on algorithms and architectures for parallel processing, ICA3PP 2015

作者： Wu, Jiangang Feng, Chang Gao, Peihuan Liao, Shizhong School of Computer Science and Technology Tianjin University Tianjin300072 China

ISBN: (纸本)9783319271361

Nyström method and low-rank linearized Support Vector Machines (SVMs) are two widely used methods for scaling up kernel SVMs, both of which need to sample part of columns of the kernel matrix to reduce the size. However, existing non-uniform sampling methods suffer from at least quadratic time complexity in the number of training data, limiting the scalability of kernel SVMs. In this paper, we pro- pose a parallel sampling method called parallel column subset selection (PCSS) based on the divide-and-conquer strategy, which divides the kernel matrix into several small submatrices and then selects columns in parallel. We prove that PCSS has a (1+ϵ) relative-error upper bound with respect to the kernel matrix. Further, we present two approaches to scaling up kernel SVMs by combining PCSS with Nyström method and lowrank linearized SVMs. the results of comparison experiments demonstrate the effectiveness, efficiency and scalability of our approaches. © Springer international Publishing Switzerland 2015.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

A cyber physical system with gpu for CNC applications 15th

A cyber physical system with gpu for CNC applications

引用

15th international conference on algorithms and architectures for parallel processing, ICA3PP 2015

作者： Chang, Jen-Chieh Chien, Ting-Hsuan Chang, Rong-Guey Department of Computer Science and Information Engineering National Chung Cheng University Chiayi62102 Taiwan

ISBN: (纸本)9783319271361

In this paper, we parallelize the collision detection of five- axis machining as an example to show how to execute CNC applications on Graphics processing Unit (GPU). We first design and implement an efficient collision detection tool, including the kinematics analyses for five-axis motions, separating axis method for collision detection, and computer simulation for verification. the machine structure is modeled as STL format in CAD software. the input to the detection system is the g-code part program, which describes the tool motions to produce the part surface. then the g-code will be partitioned and be executed by our collision detection tool in parallel on Graphics processing Unit (GPU). the system simulates the five-axis CNC motion for tool trajectory and detects any collisions according to the input g-codes. the result shows that our method can improve the performance of computational efficiency significantly when comparing to the conventional detection method. © Springer international Publishing Switzerland 2015.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Efficient Cryptology-Specific Instructions Generation with Algebra Primitives

Efficient Cryptology-Specific Instructions Generation with A...

引用

international conference on P2P, parallel, Grid, Cloud and Internet Computing (3PGCIC)

作者： Lei Liu Guijie Han Zixin Zhou Sikun Li Department of Electronics Officers College Of Chinese Armed Police Force Cheng Du China School of Computer Science National University of Defense Technology Chang Sha China

ISBN: (纸本)9781467394741

this paper presents a novel approach for cryptology-specific instructions generation on a reconfigurable architecture which is named ASRA. the ASRA tightly integrates a customized reconfigurable core with a very-long instruction word basic core. Both cores in ASRA can work in parallel. the methodology for cryptology-specific instruction generation can directly deploy algebraic operations as primitives for ASRA's custom function units (CFUs), and is able to eliminate a large portion of design space exploration difficulty from conventional data-flow graph methods. Cryptology-specific instructions for block cipher and hash algorithms which are kernel data processing tasks in security applications are exploited. then an accelerator prototype of the ASRA is built on a Xilinx Kintex-7 FPGA chip. Experiment results show that our work achieves a high performance improvement and a good flexibility.

关键词： Cryptography Hardware Registers Reconfigurable architectures Fabrics Algebra

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of dense optical flow computation on many-core processor 1

引用

15th international conference on algorithms and architectures for parallel processing, ICA3PP 2015

作者： Chen, Wenjie Yu, Jin Zhang, Weihua Jiang, Linhua Zhang, Guanhua Chai, Zhilei MoE Engineering Research Center for Software/Hardware Co-design Technology and Application East China Normal University Shanghai200061 China School of IoT Engineering Jiangnan University Wuxi214122 China Parallel Processing Institute Fudan University Shanghai200433 China Shanghai Key Lab of Modern Optical Systems University of Shanghai for Science and Technology Shanghai200093 China

ISBN: (数字)9783319271194

ISBN: (纸本)9783319271187

Computation of optical flow is a fundamental step in computer vision applications. However, due to its high complexity, it is difficult to compute a high-accuracy optical flow field in real time. this paper proposes a parallel computing approach for fast computation of high-accuracy optical flow field. It is specially designed for Tilera, a typical many-core processor with 36 tiles. By efficiently exploiting the advantages of the mesh architecture of Tilera, and by appropriately handling the parallelism inherent in the optical flow computation, the proposed implemention is able to significantly reduce the computation time while keep a low power consumption. Experiment shows that, for a 640×480 image, the computation time is only 0. 80 seconds per frame. It is 2. 56 times faster than on a typical CPU i3-3240 (3. 4GHz), and the power consumption as less as 1/6. Experimental results also show that the proposed parallel approach is highly scalable for variable requirements on computation speeds and power consumptions, since it can flexibly selects a proper number of computing cores. © Springer international Publishing Switzerland 2015.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel bloom filter on xeon phi many-core processors 15th

Parallel bloom filter on xeon phi many-core processors

引用

15th international conference on algorithms and architectures for parallel processing, ICA3PP 2015

作者： Ni, Sheng Guo, Rentong Liao, Xiaofei Jin, Hai Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan430074 China

ISBN: (纸本)9783319271217

Bloom filters are widely used in databases and network areas. these filters facilitate efficient membership checking with a low false positive ratio. It is a way to improve the throughput of bloom filter by parallel processing. Common many-core processors such as Xeon Phi can provide high parallelism. thus, we build an iterative model to analyze memory access performance. this performance suggests that the bottleneck in the traditional design is mainly caused by synchronization cost and memory latency on many-core platforms. therefore, we propose a parallel bloom filter (PBF), which is a lockless method involving input data preprocessing. this method reduces synchronization overhead and improves cache locality. We also implement and evaluate PBF on a Xeon Phi processor. Results show that the memory access performance is three times better than that of the counting bloom filter. PBF provides improved scalability, and the speedup ratio can reach a maximum of 80.7x. © Springer international Publishing Switzerland 2015.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：