This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t...
详细信息
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing *** memoryparallelprogramming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR *** experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.
OpenMP is emerging as a viable high-level programming model for sharedmemoryparallel systems. Although it has also been implemented on ccNUMA architectures, it is hard to obtain high performance on such systems. In ...
详细信息
ISBN:
(纸本)3540430431
OpenMP is emerging as a viable high-level programming model for sharedmemoryparallel systems. Although it has also been implemented on ccNUMA architectures, it is hard to obtain high performance on such systems. In this paper, we discuss various ways in which OpenMP may be used on ccNUMA and NUMA architectures, and describe a programming style that can provide scalable high performance on such systems. We give an example of its use on the SGI Origin 2000, and on TreadMarks, a Software DSM system from Rice University. These results have encouraged us to work on a programming environment that provides general support for OpenMP application development and incorporates a system to translate standard loop-level parallel OpenMP code, with additional user input in the form of directives, into an equivalent OpenMP program relying on our alternative programming style. The equivalent program does not use constructs external to OpenMP.
parallelism has become a way of life for many scientific programmers. A significant challenge in bringing the power of parallel machines to these programmers is providing them with a suite of software tools similar to...
详细信息
parallelism has become a way of life for many scientific programmers. A significant challenge in bringing the power of parallel machines to these programmers is providing them with a suite of software tools similar to the tools that sequential programmers currently utilize. Unfortunately, writing correct parallel programs remains a challenging task. In particular, automatic or semi-automatic testing tools for parallel programs are lacking. This paper takes a first step in developing an approach to providing all-uses coverage for parallel programs. A testing framework and theoretical foundations for structural testing are presented, including test data adequacy criteria and hierarchy, formulation and illustration of all-uses testing problems, classification of all-uses test cases for parallel programs, and both theoretical and empirical results with regard to what can be achieved with all-uses coverage for parallel programs. Copyright (C) 2003 John Wiley Sons, Ltd.
OpenMP is emerging as a viable high-level programming model for sharedmemoryparallel systems. It was conceived to enable easy, portable application development on this range of systems, and it has also been implemen...
详细信息
OpenMP is emerging as a viable high-level programming model for sharedmemoryparallel systems. It was conceived to enable easy, portable application development on this range of systems, and it has also been implemented on cache-coherent Non-Uniform memory Access (ccNUMA) architectures. Unfortunately, it is hard to obtain high performance on the latter architecture, particularly when large numbers of threads are involved. In this paper, we discuss the difficulties faced when writing OpenMP programs for ccNUMA systems, and explain how the vendors have attempted to overcome them. We focus on one such system, the SGI Origin 2000, and perform a variety of experiments designed to illustrate the impact of the vendor's efforts. We compare codes written in a standard, loop-level parallel style under OpenMP with alternative versions written in a Single Program Multiple Data (SPMD) fashion, also realized via OpenMP, and show that the latter consistently provides superior performance. A carefully chosen set of language extensions can help us translate programs from the former style to the latter (or to compile directly, but in a similar manner). Syntax for these extensions can be borrowed from HPF, and some aspects of HPF compiler technology can help the translation process. It is our expectation that an extended language, if well compiled, would improve the attractiveness of OpenMP as a language for high-performance computation on an important class of modern architectures. Copyright (C) 2002 John Wiley Sons, Ltd.
parallel and distributed computing (PDC) has become pervasive in all aspects of computing, and thus it is essential that students include parallelism and distribution in the computational thinking that they apply to p...
详细信息
ISBN:
(纸本)9781450394338
parallel and distributed computing (PDC) has become pervasive in all aspects of computing, and thus it is essential that students include parallelism and distribution in the computational thinking that they apply to problem solving, from the very beginning. Computer science education is still teaching to a 20th century model of algorithmic problem solving. Sequence, branch, and loop are taught in our early courses as the only organizing principles needed for algorithms, and we invest considerable time in showing how best to sequentially process large volumes of data. All computing devices that students use currently have multiple cores as well as a GPU in many cases. Most of their favorite applications use multiple cores and numbers of distributed processors. Often concurrency offers simpler solutions than sequential approaches. Industry is desperate for software engineers who think naturally in terms of exploiting these capabilities, rather than seeing them as an exotic upper-level topic that gets layered over a sequential solution. However, we are still teaching students to solve problems using sequential thinking. In this workshop we overview key PDC concepts and provide examples of how they may naturally be incorporated in early computing classes. We will introduce plugged and unplugged curriculum modules that have been successfully integrated in existing computing classes at multiple institutions. We will highlight the upcoming summer training workshop, for which we have funding to support attendance, as well as other CDER (Center for parallel and Distributed Computing Curriculum Development and Educational Resources) activities.
Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex lock...
详细信息
ISBN:
(纸本)9781450344937
Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to data. This limits parallelism because it forces threads to sequentially access shared resources. Additionally, systems use cache coherence to ensure that processors always operate on the most up-to-date version of a value even in the presence of private caches. Coherence protocol implementations cause processors to serialize their accesses to shared data, further limiting parallelism and performance.
In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existi...
详细信息
ISBN:
(纸本)9781450344937
In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by interposing a hybrid fall-back layer before the sequential, big-lock fall-back path, used by standard TSX-supported solutions in order to guarantee progress. In our experimental evaluation we use SynQuake, a realistic game benchmark modeled after Quake. Our results show that our hybrid transactional system,which we call HythTM, is able to reduce the number of transactions that go to the sequential software layer, hence avoiding hardware transaction aborts and loss of parallelism. HythTM optimizes application throughput and scalability up to 5.05x, when compared to the hardware TM with sequential fall-back path.
暂无评论