The main aim of this work is to show, how the GPGPUs can be used to speed up certain image processing methods. The algorithm explained in this paper is used to detect nuclei on (HE - hematoxilin eosin) stained colon t...
详细信息
The main aim of this work is to show, how the GPGPUs can be used to speed up certain image processing methods. The algorithm explained in this paper is used to detect nuclei on (HE - hematoxilin eosin) stained colon tissue sample images, and includes a Gauss blurring, an RGB-HSV color space conversion, a fixed binarization, an ultimate erode procedure and a local maximum search. Since the images retrieved from the digital slides require significant storage space (up to few hundred megapixels), the usage of GPGPUs to speed up image processing operations is necessary in the interest of achieving reasonable processing time. The CUDA software development kit was used to develop algorithms to GPUs made by NVIDIA. This work focuses on how to achieve coalesced global memory access when working with three-channel RGB images, and how to use the on-die shared memory efficiently. The exact test algorithm also included a linear connected component labeling, which was running on the CPU, and with iterative optimization of the GPU code, we managed to achieve significant speed up in well defined test environment.
The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) s...
详细信息
The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) stained colon tissue sample images. Since pathologists are working with large number of high resolution images - thus require significant storage space -, one feasible way to achieve reasonable processing time is the usage of GPGPUs. The CUDA software development kit was used to develop processing algorithms to NVIDIA type GPUs. Our work focuses on how to achieve better performance with coalesced global memory access when working with three-channel RGB tissue images, and how to use the on-die shared memory efficiently.
As the energy consumption of embedded multiprocessor systems becomes increasingly prominent, the real-time energy-efficient scheduling in multiprocessor systems becomes an urgent problem to reduce the system energy co...
详细信息
As the energy consumption of embedded multiprocessor systems becomes increasingly prominent, the real-time energy-efficient scheduling in multiprocessor systems becomes an urgent problem to reduce the system energy consumption while meeting real-time constraints. For a multiprocessor with independent DVFS and DPM at each processor, this paper proposes an energy-efficient real-time scheduling algorithm named LRE-DVFS-EACH, based on LRE-TL which is an optimal real-time scheduling algorithm for sporadic tasks. LRE-DVFS-EACH utilizes the concept of TL plane and the idea of fluid scheduling to dynamically scale the voltage and frequency of processors at the initial time of each TL plane as well as the release time of a sporadic task in each TL plane. Consequently, LRE-DVFS-EACH can obtain a reasonable tradeoff between the real-time constraints and the energy saving. LRE-DVFS-EACH is also adaptive to the change of workload caused by the dynamic release of sporadic tasks, which can obtain more energy savings. The experimental results show that compared with existing algorithms, LRE-DVFS-EACH can not only guarantee the optimal feasibility of sporadic tasks, but also achieve more energy savings in all cases, especially in the case of high workloads.
In this paper, we present an automatic synthesis framework to map loop nests to processor arrays with local memories on FPGAs. An affine transformation approach is firstly proposed to address space-time mapping proble...
详细信息
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the ...
详细信息
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/ subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event...
详细信息
Existing routing protocols for Wireless Mesh Networks (WMNs) are generally optimized with statistical link measures, while not addressing on the intrinsic uncertainty of wireless links. We show evidence that, with the...
详细信息
ISBN:
(纸本)9781424459889
Existing routing protocols for Wireless Mesh Networks (WMNs) are generally optimized with statistical link measures, while not addressing on the intrinsic uncertainty of wireless links. We show evidence that, with the transient link uncertainties at PHY and MAC layers, a pseudo-deterministic routing protocol that relies on average or historic statistics can hardly explore the full potentials of a multi-hop wireless mesh. We study optimal WMN routing using probing-based online anypath forwarding, with explicit consideration of transient link uncertainties. We show the underlying connection between WMN routing and the classic Canadian Traveller Problem (CTP) [1]. Inspired by a stochastic recoverable version of CTP (SRCTP), we develop a practical SRCTP-based online routing algorithm under link uncertainties. We study how dynamic next hop selection can be done with low cost, and derive a systematic selection order for minimizing transmission delay. We conduct simulation studies to verify the effectiveness of the SRCTP algorithms under diverse network configurations. In particular, compared to deterministic routing, reduction of end-to-end delay (51:15∼73:02%) and improvement on packet delivery ratio (99:76%) are observed.
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event-...
详细信息
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event-oriented, subscription-oriented and hybrid) to make all the matched pairs (event, subscription) meet in a system. By theoretically analyzing the inherent problem of both event-oriented and subscription-oriented methods, we propose PEM (Popularity-based Event Matching), a variant of hybrid method. PEM can achieve better trade-off between event processing load and subscription storage load of a system. PEM has been verified through both mathematical and simulation-based evaluation.
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the ...
详细信息
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S 2 ), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
In this paper, we present an automatic synthesis framework to map loop nests to processor arrays with local memories on FPGAs. An affine transformation approach is firstly proposed to address space-time mapping proble...
详细信息
In this paper, we present an automatic synthesis framework to map loop nests to processor arrays with local memories on FPGAs. An affine transformation approach is firstly proposed to address space-time mapping problem. Then a data-driven architecture model is introduced to enable automatic generation of processor arrays by extracting this data-driven architecture model from transformed loop nests. Some techniques including memory allocation, communication generation and control generation are presented. Synthesizable RTL codes can be easily generated from the architecture model built by these techniques. A preliminary synthesis tool is implemented based on PLUTO, an automatic polyhedral source-to-source transformation and parallelization framework.
暂无评论