Gaussian Mixture Models (GMMs) are widely used among scientists e.g. in statistics toolkits and data mining procedures. In order to estimate parameters of a GMM the Maximum Likelihood (ML) training is often utilized, ...
详细信息
Gaussian Mixture Models (GMMs) are widely used among scientists e.g. in statistics toolkits and data mining procedures. In order to estimate parameters of a GMM the Maximum Likelihood (ML) training is often utilized, more precisely the Expectation-Maximization (EM) algorithm. Nowadays, a lot of tasks works with huge datasets, what makes the estimation process time consuming (mainly for complex mixture models containing hundreds of components). The paper presents an efficient and robust implementation of the estimation of GMM statistics used in the EM algorithm on GPU using NVIDIA's Compute Unified Device Architecture (CUDA). Also an augmentation of the standard CPU version is proposed utilizing SSE instructions. Time consumptions of presented methods are tested on a large dataset of real speech data from the NIST Speaker Recognition Evaluation (SRE) 2008. Estimation on GPU proves to be more than 400 times faster than the standard CPU version and 130 times faster than the SSE version, thus a huge speed up was achieved without any approximations made in the estimation formulas. Proposed implementation was also compared to other implementations developed by other departments over the world and proved to be the fastest (at least 5 times faster than the best implementation published recently).
We show that the location and velocity of the harmonic oscillator with Levy-stable noise are mutually Levy-stable distributed. We give explicitly the associated spectral measure, exhibiting both the non independence a...
详细信息
We show that the location and velocity of the harmonic oscillator with Levy-stable noise are mutually Levy-stable distributed. We give explicitly the associated spectral measure, exhibiting both the non independence and the non ellipticity of the location-velocity couple. We then propose measures of deviation from ellipticity.
Traditionally, imageprocessing based on Markov Random Field (MRF) is often addressed on a 4-connected grid graph defined on the image. This structure is not computationally efficient. In our work, we develop a multip...
详细信息
Traditionally, imageprocessing based on Markov Random Field (MRF) is often addressed on a 4-connected grid graph defined on the image. This structure is not computationally efficient. In our work, we develop a multiple-trees structure to approximate the 4-connected grid. A set of spanning trees are generated by a new algorithm: re-weighted random walk (RWRW). This structure effectively covers the original grid and guarantees uniformly distributed occurrence of each edge. Exact maximum a posterior (MAP) inference is performed on each tree structure by dynamic programming and a median filter is chosen to merge the results together. As an important application, image denoising is used to validate our method. Experimentally, our algorithm provides better performance and higher computational efficiency than traditional methods (such as Loopy Belief Propagation) on a 4-connected MRF.
We present a parallel implementation of a new deformable image registration algorithm using the Computer Unified Device Architecture (CUDA). The algorithm co-registers preoperative and intraoperative 3-dimensional mag...
详细信息
ISBN:
(纸本)9781424441211
We present a parallel implementation of a new deformable image registration algorithm using the Computer Unified Device Architecture (CUDA). The algorithm co-registers preoperative and intraoperative 3-dimensional magnetic resonance (MR) images of a deforming organ. It employs a linear elastic dynamic finite-element model of the deformation and distance measures such as mutual information and sum of squared differences to align volumetric image data sets. Computationally intensive elements of the method such as interpolation, displacement and force calculation are significantly accelerated using a Graphics processing Unit (GPU). The result of experiments carried out with a realistic breast phantom tissue shows a 37-fold speedup for the GPU-based implementation compared with an optimized CPU-based implementation in high resolution MR image registration. The GPU implementation is capable of registering 512×512×136 image sets in just over 2 seconds, making it suitable for clinical applications requiring fast and accurate processing of medical images.
This 4-Volume-Set, CCIS 0251 - CCIS 0254, constitutes the refereed proceedings of the International conference on Informatics Engineering and Information Science, ICIEIS 2011, held in Kuala Lumpur, Malaysia, in Novemb...
ISBN:
(数字)9783642254536
ISBN:
(纸本)9783642254529
This 4-Volume-Set, CCIS 0251 - CCIS 0254, constitutes the refereed proceedings of the International conference on Informatics Engineering and Information Science, ICIEIS 2011, held in Kuala Lumpur, Malaysia, in November 2011. The 210 revised full papers presented together with invited papers in the 4 volumes were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on e-learning, information security, software engineering, imageprocessing, algorithms, artificial intelligence and soft computing, e-commerce, data mining, neural networks, social networks, grid computing, biometric technologies, networks, distributed and parallel computing, wireless networks, information and data management, web applications and software systems, multimedia, ad hoc networks, mobile computing, as well as miscellaneous topics in digital information and communications.
This 4-Volume-Set, CCIS 0251 - CCIS 0254, constitutes the refereed proceedings of the International conference on Informatics Engineering and Information Science, ICIEIS 2011, held in Kuala Lumpur, Malaysia, in Novemb...
ISBN:
(数字)9783642254628
ISBN:
(纸本)9783642254611
This 4-Volume-Set, CCIS 0251 - CCIS 0254, constitutes the refereed proceedings of the International conference on Informatics Engineering and Information Science, ICIEIS 2011, held in Kuala Lumpur, Malaysia, in November 2011. The 210 revised full papers presented together with invited papers in the 4 volumes were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on e-learning, information security, software engineering, imageprocessing, algorithms, artificial intelligence and soft computing, e-commerce, data mining, neural networks, social networks, grid computing, biometric technologies, networks, distributed and parallel computing, wireless networks, information and data management, web applications and software systems, multimedia, ad hoc networks, mobile computing, as well as miscellaneous topics in digital information and communications.
The main aim of this work is to show, how the GPGPUs can be used to speed up certain imageprocessingmethods. The algorithm explained in this paper is used to detect nuclei on (HE - hematoxilin eosin) stained colon t...
详细信息
The main aim of this work is to show, how the GPGPUs can be used to speed up certain imageprocessingmethods. The algorithm explained in this paper is used to detect nuclei on (HE - hematoxilin eosin) stained colon tissue sample images, and includes a Gauss blurring, an RGB-HSV color space conversion, a fixed binarization, an ultimate erode procedure and a local maximum search. Since the images retrieved from the digital slides require significant storage space (up to few hundred megapixels), the usage of GPGPUs to speed up imageprocessing operations is necessary in the interest of achieving reasonable processing time. The CUDA software development kit was used to develop algorithms to GPUs made by NVIDIA. This work focuses on how to achieve coalesced global memory access when working with three-channel RGB images, and how to use the on-die shared memory efficiently. The exact test algorithm also included a linear connected component labeling, which was running on the CPU, and with iterative optimization of the GPU code, we managed to achieve significant speed up in well defined test environment.
As the number of processors sharing a cache increases, conflict misses due to interference amongst competing processes have an increasing impact on the individual performance of processes. Cache partitioning is a meth...
详细信息
As the number of processors sharing a cache increases, conflict misses due to interference amongst competing processes have an increasing impact on the individual performance of processes. Cache partitioning is a method of allocating a cache between concurrently executing processes in order to counteract the effects of inter-process conflicts. However, cache partitioning methods commonly divide a shared cache into private partitions dedicated to a single processor, which can lead to underutilized portions of the cache when set accesses are non-uniform. Our proposed method compliments these cache partitioning algorithms by creating an additional shared partition able to be shared amongst all processors. Underutilized areas of the cache are identified by a monitoring circuit and used for the shared partition. Detection of underutilization is based on the number of unique set accesses for a given allocated way. For a 16-way set associative cache, the implementation of our method requires 64 bytes of storage overhead per core in addition to that needed for the method that determines the sizes of the private partitions. For the tested system, our method is able to improve performance over the traditional LRU policy for a number of selected benchmark sets by an average of 1.4% and up to 13.3% for a two core system and an average of 1.4% and up to 7.8% for a four core system, and is able to improve the performance of a conventional cache partitioning method (Utility-Based Cache Partitioning) by an average of 0.1% and up to 0.5% for both a two and four core systems.
With the rapid expansion of Internet and distributed technology, multimedia security and digital rights management have been received much attention in the literature, As a major method for intellectual property right...
详细信息
With the rapid expansion of Internet and distributed technology, multimedia security and digital rights management have been received much attention in the literature, As a major method for intellectual property right protecting, digital watermarking techniques have been widely studied and used. Extensive algorithms have been proposed for digital images. In this paper, a robust digital image watermarking scheme which applies the common transformation domain methods and evolutionary computation technique is presented. The experimental results show that the proposed scheme is robust to several common image-processing attacks, and has good perceptual quality at the same time.
The importance of ballistic applications has been recently recognized due to the increasing crime and terrorism threats and incidents around the world. Ballistic image analysis is one of the application areas which re...
详细信息
The importance of ballistic applications has been recently recognized due to the increasing crime and terrorism threats and incidents around the world. Ballistic image analysis is one of the application areas which requires immediate response with high precision from large databases. Here, the microscopic markings on cartridge case of a bullet obtained in a crime scene are compared with that of images on ballistic databases for similarity in order to find out whether it is fired from any of the firearms within the database. In this paper, we have implemented a MapReduce solution using Hadoop for ballistic image comparison which is a high data and computation intensive task. MapReduce, a programming model developed by Google, provides a scalable, flexible and QoS guaranteed IT infrastructure particularly for embarrassingly parallel data oriented computational tasks. Our results have shown that we can effectively utilize the computing resources and gain significant increases in performance. Furthermore, we will share our experiences in programming and tuning a Hadoop cluster in the paper.
暂无评论