Today the terms machine learning (ML) and Big Data are closely correlated. This, and the complexity of many ML algorithms, motivates a search for fast parallel computation methods. A further motivating factor is a nee...
详细信息
ISBN:
(纸本)9781450365239
Today the terms machine learning (ML) and Big Data are closely correlated. This, and the complexity of many ML algorithms, motivates a search for fast parallel computation methods. A further motivating factor is a need to deal with memory size limitations, especially for the moderately-sized machines common in many ML applications. In addition, it is desirable to develop generally applicable methods, rather than needing to develop a different parallel approach for every ML algorithm. In this work, we apply a technique we call Software Alchemy to ML. We are particularly interested in ML for recommender systems, and explore the feasibility of SA in that context.
It is shown that any multivariate polynomial of degree d that can be computed sequentially in C steps can be computed in parallel in $O((\log d)(\log C + \log d))$ steps using only $(Cd)^{O(1)} $ processors.
It is shown that any multivariate polynomial of degree d that can be computed sequentially in C steps can be computed in parallel in $O((\log d)(\log C + \log d))$ steps using only $(Cd)^{O(1)} $ processors.
It has been shown that a newly proposed micro-modeling method for deriving a concise passive circuit of a large-scale EM problem is highly suitable for GPU parallel computation. However, due to the memory bandwidth li...
详细信息
ISBN:
(纸本)9781509048373
It has been shown that a newly proposed micro-modeling method for deriving a concise passive circuit of a large-scale EM problem is highly suitable for GPU parallel computation. However, due to the memory bandwidth limit of GPU, the utilization of GPU is far from its peak performance because more than 97% processing time is occupied by the frequent data transactions. This paper proposes an effective strategy for GPU acceleration of the micro-modeling algorithm, which can significantly reduce data transactions between off-chip memory and in-chip memory of GPUs. A practical numerical example of a large-scale interconnection and packaging problem shows that the proposed strategy is effective and the parallel computation of the micro-modeling circuit using GPUs will be further accelerated by one order of magnitude if 4 or more iterative derivation processes of can be conducted by one run.
The change of ship's characteristic length can change scientific researching method. The paper gains the calculation parameters of Green function and the relational expression of vessels' characteristic length...
详细信息
ISBN:
(纸本)9781479941698
The change of ship's characteristic length can change scientific researching method. The paper gains the calculation parameters of Green function and the relational expression of vessels' characteristic length by establishing the analysis theory of ships' characteristic length dimension based on the control equation of green function and gives the statistical expression for ships' characteristic length. Moreover, it constructs a parallel algorithm of green function. The numerical results show that our algorithm has high parallel calculating ratio.
A new method of finding approximate solutions of linear algebraic systems with ill-conditioned or singular matrices, using Schmidt orthogonalization, is presented. This method can be effectively used for arranging par...
详细信息
ISBN:
(纸本)9789881925107
A new method of finding approximate solutions of linear algebraic systems with ill-conditioned or singular matrices, using Schmidt orthogonalization, is presented. This method can be effectively used for arranging parallel computations for matrices of large size.
In big data environment, data loss is a crucial issue which probably will occur due to the high network traffic, transmission delay and lesser bandwidth. This problem could be solved by adopting data compression schem...
详细信息
ISBN:
(纸本)9781467388566;9781467388559
In big data environment, data loss is a crucial issue which probably will occur due to the high network traffic, transmission delay and lesser bandwidth. This problem could be solved by adopting data compression schemes. These schemes could be classified into two types based on their actions: lossless compression and lossy compression. Lossy compression changes the output which will not be the same as input. Lossless compression changes the output and produces the output same as the input data. So the network overhead could be increased. The existing fixed and variable length coding technique have high robustness but poor efficiency. The efficiency problem can be solved by using the proposed scheme called "Data compression and parallel computation research model". This proposed model uses a more sophisticated coding technique for the data compression and increases the efficiency while reducing the delay. Simulation results have shown that the proposed data compression and parallel computation research model has the better signal to noise ratio, increases the efficiency and reduces the delay when comparing to the existing models.
In this paper, we discuss computations of optimal pairings over some pairing-friendly curves and a symmetric pairing over supersingular curves via elliptic nets. We show that optimal pairings can be computed more effi...
详细信息
ISBN:
(纸本)9783319445243;9783319445236
In this paper, we discuss computations of optimal pairings over some pairing-friendly curves and a symmetric pairing over supersingular curves via elliptic nets. We show that optimal pairings can be computed more efficiently if we use twists of elliptic curves and give formulae for computing optimal pairings via elliptic nets of these twist curves. Furthermore, we propose parallel algorithms for these pairings and estimate the costs of these algorithms in certain reasonable assumptions.
Integrity checking is indispensable in the current technological age. One of the most popular algorithms for integrity checking is SHA-256. To achieve high performance, many applications generally design SHA-256 in ha...
详细信息
ISBN:
(纸本)9781665415033
Integrity checking is indispensable in the current technological age. One of the most popular algorithms for integrity checking is SHA-256. To achieve high performance, many applications generally design SHA-256 in hardware. However, the processing rate of SHA-256 is often low due to a large number of computations. Besides, data must be repeated in many loops to generate a hash, which requires transferring data multiple times between accelerator and off-chip memory if not using local memory. In this paper, an ALU combining fully parallel computation and pipeline layers is proposed to increase the SHA-256 processing rate. Moreover, the local memory is attached near ALU for reducing off-chip memory access during the iterations of computing. In the high hash rate, we design a SoC-based multicore SHA-256 accelerator. As a result, our proposed accelerator enhances throughput by more than 40% and be 2x higher hardware efficiency compared with the state-of-the-art design.
parallel computation is an effective technology to improve the executive performance of computer programs. In this paper, a new machining region parallel planning method is presented to generate optimal tool paths for...
详细信息
ISBN:
(纸本)9781424425020
parallel computation is an effective technology to improve the executive performance of computer programs. In this paper, a new machining region parallel planning method is presented to generate optimal tool paths for 5-axis sculptured surface machining. Based on existing machining region planning methods, the improved method adopted data decomposition mode and OpenMP to implement the parallel computing strategy. By dividing the part surface into two or more areas, this method can generate tool paths with higher performance and shorter total length. This means it can reduce the cost of 5-axis machining. Computer implementation and examples were shown in this paper to prove the validity of the new method.
With the advancement of technology and the spread of multi-core systems, the need for parallelization arises and the interest in programming models is growing. At the same time, new distributed computing models have b...
详细信息
ISBN:
(纸本)9798350377521;9798350377514
With the advancement of technology and the spread of multi-core systems, the need for parallelization arises and the interest in programming models is growing. At the same time, new distributed computing models have been proposed, being in fierce competition to obtain the highest possible performance. The Drop Computing Paradigm proposes the idea of decentralized computing over ad-hoc opportunistic networks of mobile and Edge devices. In this respect, the Drop Computing model does not only aim to achieve a minimum turnaround time but also to optimize other characteristics related to mobile devices, such as limited resources and opportunistic communication. Therefore, it is necessary to define a new programming model called DroMPI that intends to extend the capabilities of current parallel and distributed programming models, based on the Drop Computing paradigm. Therefore, the solution aims to develop a library that takes advantage of hardware capabilities in the interest of the Drop Computing paradigm and also provides programmers with a high-level programming interface. The library's features will be based on the Message Passing Interface (MPI) standard, which will be responsible for inter-node parallelization. The name of the library, DroMPI, is an acronym for Drop Computing and MPI. The implementation of the model will be responsible for the management of communication between nodes and for providing an Application Programming Interface (API) for the development of parallel applications in the Drop Computing paradigm.
暂无评论