Matching medical image data is a key factor for appropriate computer aided diagnosis. For the past several decades, many imageprocessing technologies have been developed and discussed. However, most of the methods ar...
详细信息
Matching medical image data is a key factor for appropriate computer aided diagnosis. For the past several decades, many imageprocessing technologies have been developed and discussed. However, most of the methods are only of theoretical interest because the time complexity of the matching methods is too high for realistic handling of huge amounts of existing medical images. This paper presents a parallelprocessing model for matching huge amounts of MR images. A feature vector of an MR image is defined by professionals specifically in the area of neuroscience. Then a matching algorithm is developed based on matching the feature vectors. The algorithm is shown to be suitable for parallel process, and provides acceptable results. The experiments show that the overhead of synchronizing the parallel process is less significant than the improvement of the overall efficiency.
The simplex algorithm for linear programming has two major variants: the original, or standard method, and the revised method. Today, virtually all serious implementations are based on the revised method because it is...
详细信息
ISBN:
(纸本)1932415262
The simplex algorithm for linear programming has two major variants: the original, or standard method, and the revised method. Today, virtually all serious implementations are based on the revised method because it is much faster for sparse LPs, which are most common. However, the standard method has advantages as well. First, the standard method is effective for dense problems. While dense problems are uncommon in general, they occur frequently in some important applications such as wavelet decomposition, digital filter design, text categorization, and imageprocessing. Second, the standard method can be easily and effectively extended to a coarse grained, distributed algorithm. We look at distributed linear programming especially optimized for loosely coupled workstations.
distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find tha...
详细信息
ISBN:
(纸本)9781450397339
distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41x speedup compared to the state-of-the-art distributed training baselines.
Given an n x n binary image of white and black pixels, we present an optimal parallel algorithm for computing the distance transform and the nearest feature transform using the Euclidean metric. The algorithm employs ...
详细信息
Given an n x n binary image of white and black pixels, we present an optimal parallel algorithm for computing the distance transform and the nearest feature transform using the Euclidean metric. The algorithm employs the systolic computation to achieve O(n) running time on a linear array of n processors.
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatio-temporal 3D DWT has been developed in order to analyze mo...
详细信息
The Discrete Wavelet Transform (DWT) is becoming a widely used tool in imageprocessing and other data analysis areas. A non-conventional variation of a spatio-temporal 3D DWT has been developed in order to analyze motion in time-sequential imagery. The computational complexity of this algorithm is Θ(n3), where n is the number of samples in each dimension of the input image sequence. methods are needed to increase the speed of these computations for large data sets. Fortunately, wavelet decomposition is very amenable to parallelization. Coarse-grained parallel versions of this process have been design and implemented on three different architectures: a distributed network represented by a distributed network of Sun SPARCstation 2 workstations;two Intel hypercubes (an iPSC/2 and an iPSC/860);and a Thinking Machines Corporation CM-5, a massively parallel SPMD. This non-conventional 3D wavelet decomposition is very suitable for course-grain implementation on parallel computers with proper load balancing. Close to linear speedup over serial implementations has been achieved using a distributed network. Near-linear speedup was obtained on the hypercubes and the CM-5 for a variety of image-processing applications.
Domain Adaptation for semantic segmentation is of vital significance since it enables effective knowledge transfer from a labeled source domain (i.e., synthetic data) to an unlabeled target domain (i.e., real images),...
详细信息
ISBN:
(纸本)9781665435741
Domain Adaptation for semantic segmentation is of vital significance since it enables effective knowledge transfer from a labeled source domain (i.e., synthetic data) to an unlabeled target domain (i.e., real images), where no effort is devoted to annotating target samples. Prior domain adaptation methods are mainly based on image-to-image translation model to minimize differences in image conditions between source and target domain. However, there is no guarantee that feature representations from different classes in the target domain can be well separated, resulting in poor discriminative representation. In this paper, we propose a unified learning pipeline, called image Translation and Representation Alignment (ITRA), for domain adaptation of segmentation. Specifically, it firstly aligns an image in the source domain with a reference image in the target domain using image style transfer technique (e.g., CycleGAN) and then a novel pixelcentroid triplet loss is designed to explicitly minimize the intraclass feature variance as well as maximize the inter-class feature margin. When the style transfer is finished by the former step, the latter one is easy to learn and further decreases the domain shift. Extensive experiments demonstrate that the proposed pipeline facilitates both image translation and representation alignment and significantly outperforms previous methods in both GTA5 -> Cityscapes and SYNTHIA -> Cityscapes scenarios.
This paper proposes a parallel solution to retrieve images from distributed data sources using perceptual grouping of block-based visual patterns. The method of grouping visual patterns into image model based on gener...
详细信息
ISBN:
(纸本)9781424414369
This paper proposes a parallel solution to retrieve images from distributed data sources using perceptual grouping of block-based visual patterns. The method of grouping visual patterns into image model based on generalized Hough transform is one of the most powerful techniques for image analysis. However, real-time applications of this method have been prohibited due to the computational intensity in similarity searching from a large centralized image collection. A query object is decomposed into non-overlapped blocks, where each of them is represented as a visual pattern obtained by detecting the line edge from the block using the moment-preserving edge detector. A voting scheme based on generalized Hough transform is proposed to provide object search method, which is invariant to the translation, rotation, scaling of image data. In this work, we describe a heterogeneous cluster-oriented CBIR implementation. First, the workload to perform an object search is analyzed, and then, a new load balancing algorithm for the CBIR system is presented. Simulation results show that the proposed method gives good performance and spans a new way to design a cost-effective CBIR system.
Several parallel and distributed data mining algorithms have been proposed in literature to perform large scale data analysis, overcoming the bottleneck of traditional methods on a single machine. However, although th...
详细信息
ISBN:
(纸本)9798350363074;9798350363081
Several parallel and distributed data mining algorithms have been proposed in literature to perform large scale data analysis, overcoming the bottleneck of traditional methods on a single machine. However, although the master-worker approach greatly simplifies the synchronization of all nodes since only the master is in charge to do that, it also presents several problematic issues for large-scale data analysis tasks (involving thousands or millions of nodes). This paper presents a hierarchical (or multi-level) master-worker framework for iterative parallel data analysis algorithms, to overcome the scalability issues affecting classic master-worker solutions. Specifically, the framework is composed of (more than one) merger and worker nodes organized in a k-tree structure, in which the workers are on the leaves and the mergers are on the root and the internal nodes in the tree.
Aiming at TB-scale time-varying scientific datasets, this paper presents a novel static load balancing scheme based on information entropy to enhance the efficiency of the parallel adaptive volume rendering algorithm....
详细信息
ISBN:
(纸本)9789898533388
Aiming at TB-scale time-varying scientific datasets, this paper presents a novel static load balancing scheme based on information entropy to enhance the efficiency of the parallel adaptive volume rendering algorithm. An information-theory model is proposed firstly, and then the information entropy is calculated for each data patch, which is taken as a pre-estimation of the computational amount of ray sampling. According to their computational amounts, the data patches are distributed to the processing cores balancedly, and accordingly load imbalance in parallel rendering is decreased. Compared with the existing methods such as random assignment and ray estimation, the proposed entropy-based load balancing scheme can achieve a rendering speedup ratio of 1.23 similar to 2.84. It is the best choice in interactive volume rendering due to its speedup performance and view independence.
Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. Different applications can exhibit tot...
详细信息
ISBN:
(纸本)0769516807
Different parallelization methods vary in their system requirements, programming styles, efficiency of exploring parallelism, and the application characteristics they can handle. Different applications can exhibit totally different performance gains depending on the parallelization method used. This paper compares OpenMP, MPI, and Strings(A distributed shared memory)for parallelizing a complicated tribology problem. The problem size and computing infrastructure are changed and their impacts on the parallelization methods are studied. All of the methods studied exhibit good performance improvements. This paper exhibits the benefits that are the result of applying parallelization techniques to applications in this field.
暂无评论