This paper presents a new fast and scalable Parallel union-find algorithm for image segmentation and its System-on-Chip (SoC) implementation using 65nm CMOS technology following the Application-Specific Integrated Cir...
详细信息
ISBN:
(纸本)9781479917631
This paper presents a new fast and scalable Parallel union-find algorithm for image segmentation and its System-on-Chip (SoC) implementation using 65nm CMOS technology following the Application-Specific Integrated Circuit (ASIC) design flow. The algorithm is capable of labeling all foreground and background pixels, using the least possible pixels scanning. This contrasts the classical labeling algorithms that label only foreground (or background) pixels in a single run. The new algorithm utilizes only two memory blocks. In one memory block, it labels image segments using their seeds as the label and, simultaneously, the segments sizes are used as the other label in second memory block. By this parallel labeling, monitoring the image segments is very fast and efficient. With 350 MHz operating frequency, the processing rate estimated to be 2100 frames/sec, the total chip area of 15950.5 μm~2 (off-chip memory) and very low-power of 0.3 mW, the SoC tends to be an excellent candidate for mobile devices and real-time applications.
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent seque...
详细信息
ISBN:
(纸本)9781467308052
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing up to several hundred million high-dimensional points, we show that PDSDBSCAN significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
暂无评论