To improve the intelligent image recognition abilities of edge devices, a parallel-optimization-based framework called POWER is introduced in this paper. With FPGA (Field-Programmable Gate Array) as its hardware modul...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
To improve the intelligent image recognition abilities of edge devices, a parallel-optimization-based framework called POWER is introduced in this paper. With FPGA (Field-Programmable Gate Array) as its hardware module, POWER provides well extensibility and flexible customization capability for developing intelligent firmware suitable for different types of edge devices in various scenarios. through an actual case study, we design and implement a firmware prototype following the specification of POWER and explore its performance improvement using parallel optimization. Our experimental results show that the firmware prototype we implement exhibits good performance and is applicable to substation inspection robots, which also validate the effectiveness of our POWER framework in designing edge intelligent firmware modules indirectly.
parallel memory modules can be used to increase memory bandwidth and feed a processor withthe required access patterns of data. the parallel storage mechanism organized and managed by multiple storage modules can sui...
详细信息
ISBN:
(纸本)9783030050573;9783030050566
parallel memory modules can be used to increase memory bandwidth and feed a processor withthe required access patterns of data. the parallel storage mechanism organized and managed by multiple storage modules can suit applications of images and videos. Previous investigation into data storage schemes can be used to achieve continuous conflict free access by rows, columns or blocks, however it is not only satisfied with some sliding window applications in video and image processingalgorithms (including convolutional neural networks, sub-pixel difference, 2D filtering, etc.) which need non-conflicting access by steps in computation, but also there is a different demand for horizontal and vertical strides in computing sub-processes. this paper presents a storage scheme that support for row access without collision alignment, and non-aligned block-with-stride access storage modes beginning at any address. theoretical proofs and experiments verify the correct ness of the module address (module number to which the address is mapped). And in hardware design, it was found that in the typical case there was no path violation and with less area overhead. It suitable for application of CNN to improve performance in algorithm in convolutional.
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and sp...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size. this paper proposes a parallel multi-level solution for all-pair SimRank similarity computing on large graphs. We partition the objective graph first withthe idea of modularity maximization and get a collapsed graph based on the blocks. then we compute the similarities between verteices inside a block as well as the similarities between the blocks. In the end, we integrate these two types of similarities and calculate the approximate SimRank simlarities between all vertex pairs. the method is implemented on Spark platform and it makes an improvement on time efficiency while maintaining the effectiveness compared to SimRank.
As a classical method of image segmentation in mathematical morphology, the watershed transform has been applied successively into some fields like remote sensing image processing, biomedical and computer vision appli...
详细信息
Hierarchical identity-based cryptography is an efficient technology to address the security issues in cloud storage. However, the inherent key escrow problem primarily hinders the widespread adoption of this cryptosys...
详细信息
ISBN:
(纸本)9783319271613;9783319271606
Hierarchical identity-based cryptography is an efficient technology to address the security issues in cloud storage. However, the inherent key escrow problem primarily hinders the widespread adoption of this cryptosystem in practice. To address the key escrow problem, this paper proposes an escrow-free hierarchical identity-based signature model, in which a user signs messages with a user-selected secret and PKG signing factor apart from the private key. For proving the full security, we formulate three security games with respect to our signature model. We instantiate the escrow-free model into a specific scheme based on the SHER-IBS scheme and prove that our scheme is secure against adaptive chosen ID and message attacks.
One-sided communication mechanism of Messaging Passing Interface (MPI) has been extended by remote memory access (RMA) from several aspects, including interface, language and compiler, etc. Coarray Fortran (CAF), as a...
详细信息
ISBN:
(纸本)9783030050634;9783030050627
One-sided communication mechanism of Messaging Passing Interface (MPI) has been extended by remote memory access (RMA) from several aspects, including interface, language and compiler, etc. Coarray Fortran (CAF), as an emerging syntactic extension of Fortran to satisfy one-sided communication, has been freely supported by the open-source and widely used GNU Fortran compiler, which relies on MPI-3 as the transport layer. In this paper, we present the potential of RMA to benefit the communication patterns in Cannon algorithm. EVENTS, a safer implementation of atomics to synchronize different processes in CAF, are also introduced via classic Fast Fourier Transform (FFT). In addition, we also studied the performance of one-sided communication based on different compilers. In our tests, one-sided communication outperforms two-sided communication only when the data size is large enough (in particular, inter-node transfer). CAF is slightly faster than the simple one-sided routines without optimization by compiler in MPI-3. EVENTS are capable of improving the performance of parallel applications by avoiding the idle time.
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling too...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling tools have been developed to understand these characteristics, the interplay between task scheduling and data reuse in the cache hierarchy has not been explored. these interactions are particularly intriguing due to the flexibility task-based runtimes have in scheduling tasks, which may allow them to improve cache behavior. this work presents StatTask, a novel statistical cache model that can predict cache behavior for arbitrary task schedules and cache sizes from a single execution, without programmer annotations. StatTask enables fast and accurate modeling of data locality in task-based applications for the first time. We demonstrate the potential of this new analysis to scheduling by examining applications from the BOTS benchmarks suite, and identifying several important opportunities for reuse-aware scheduling.
Conventional software speculative parallel models are facing challenges due to the increasing number of the processor core and the diversification of the application. the speculation accuracy is one of the key factors...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Conventional software speculative parallel models are facing challenges due to the increasing number of the processor core and the diversification of the application. the speculation accuracy is one of the key factors to the performance of software speculative parallel model. In this paper, we proposed a novel value prediction mechanism named Inter-thread Fetching Value Prediction(IFVP). It supports a speculative thread to read the values of conflict variables speculatively from another speculative thread. this method can remarkably reduce the miss speculation rate in a loop to be parallelized with cross-iter dependencies. We have proved that the IFVP can improve the speculation accuracy by about 19.1% on the average, and can improve the performance by about 37.1% on the average, compared withthe conventional models without value prediction.
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one ...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one of the major energy consumers, in which, a large amount of data is queried by complex queries running daily. Having green nodes is a pre-condition to design an energy-aware parallel database cluster. Generally, the most existing DBMSs focus on high-performance during query optimization phase, while usually ignoring the energy consumption of the queries. In this paper, we propose a methodology, supported by a tool called EnerQuery, that makes nodes of parallel database clusters saving energy when optimizing queries. To show its effectiveness, we implement our proposal on the top of PostgreSQL DBMS query optimizer. A mathematical cost model based on a machine learning technique is defined and used to estimate the energy consumption of SQL queries.
暂无评论