检索结果-内蒙古大学图书馆

15th IEEE/ACIS International Conference on Computer and Information Science (ICIS)

作者： Hirata, Hiroaki Nunome, Atsushi Shibayama, Kiyoshi Kyoto Inst Technol Fac Informat & Human Sci Sakyo Ku Kyoto 6068585 Japan

ISBN: (纸本)9781509008056

We have been developing a multiprocessor architecture which executes iterations of a loop speculatively in parallel. In this paper, we present speculative memory (SM), in order to enable the large-scale speculation which supports the speculative execution of the iteration of arbitrary size and duration. With SM, a programmer can hint explicitly that iterations of a certain loop are preferable to be executed speculatively in parallel. SM manages multiple values (versions) of speculatively modified data. SM also features the memory renaming and the delayed execution of the program codes, which could be viewed as a dynamic code migration. These can remove the dependencies between loop iterations or alleviate the occurrence of dependency hazards. Thus, SM can improve the success rate of the speculation, and consequently, makes it possible to extract the thread-level parallelism more than ever before.

关键词： thread-level speculation (TLS) speculative multithreading (SpMT) shared-memory multiprocessor parallel architecture multithreaded programming

来源：评论

学校读者我要写书评

暂无评论

Bug Finding Methods for multithreaded Student programming Projects

Bug Finding Methods for Multithreaded Student Programming Pr...

引用

作者： Naciri, William Malik Virginia Tech | University

The fork-join framework project is one of the more challenging programming assignments in the computer science curriculum at Virginia Tech. Students in Computer Systems must manage a pool of threads to facilitate the shared execution of dynamically created tasks. This project is difficult because students must overcome the challenges of concurrent programming and conform to the project’s specific semantic requirements. When working on the project, many students received inconsistent test results and were left confused when debugging. The suggested debugging tool, Helgrind, is a general-purpose thread error detector. It is limited in its ability to help fix bugs because it lacks knowledge of the specific semantic requirements of the fork-join framework. Thus, there is a need for a special-purpose tool tailored for this project. We implemented Willgrind, a debugging tool that checks the behavior of fork-join frameworks implemented by students through dynamic program analysis. Using the Valgrind framework for instrumentation, checking statements are inserted into the code to detect deadlock, ordering violations, and semantic violations at run-time. Additionally, we extended Willgrind with happens-before based checking in WillgrindPlus. This tool checks for ordering violations that do not manifest themselves in a given execution but could in others. In a user study, we provided the tools to 85 students in the Spring 2017 semester and collected over 2,000 submissions. The results indicate that the tools are effective at identifying bugs and useful for fixing bugs. This research makes multithreaded programming easier for students and demonstrates that special-purpose debugging tools can be beneficial in computer science education.

关键词： Binary Instrumentation multithreaded programming Debugging Thesis

来源：评论

学校读者我要写书评

暂无评论

ThreadMentor: a pedagogical tool for multithreaded programming

引用

Journal on Educational Resources in Computing 2003年第1期3卷 1–es页

作者： Steve Carr Jean Mayo Ching-Kuang Shene Michigan Technological University Houghton

ThreadMentor is a multiplatform pedagogical tool designed to ease the difficulty in teaching and learning multithreaded programming. It consists of a C++ class library and a visualization system. The class library supports many thread management functions and synchronization primitives in an object-oriented way, and the visualization system is activated automatically by a user program and shows the inner working of every thread and every synchronization primitive on-the-fly. Events can also be saved for playback. In this way, students will be able to visualize the dynamic behavior of a threaded program and the interaction among threads and synchronization primitives.

关键词： multithreaded programming synchronization synchronization primitives threads visualization

来源：评论

学校读者我要写书评

暂无评论

An initial evaluation of the Tera multithreaded Architecture and programming system using the C3I parallel benchmark suite 98

An initial evaluation of the Tera Multithreaded Architecture...

引用

Proceedings of the 1998 ACM/IEEE conference on Supercomputing

作者： Sharon Brunett John Thornley Marrq Ellenbecker California Institute of Technology Pasadena California

ISBN: (纸本)9780897919845

The Tera multithreaded Architecture (MTA) is a radical new architecture intended to revolutionize high-performance computing in both the scientific and commercial marketplaces. Each processor supports 128 threads in hardware. Extremely fast thread switching is used to mask latency in a uniform-access memory system without caching. It is claimed that these hardware characteristics allow compilers to easily transform sequential programs into efficient multithreaded programs for the Tera MTA. In this paper, we attempt to provide an objective initial evaluation of the performance of the Tera multithreaded architecture and programming system for general-purpose applications. The basis of our investigation is two programs from the C3I Parallel Benchmark Suite (C3IPBS). Both these programs have previously been shown to have the potential for large-scale parallelization. We compare the performance of these programs on (i) a fast uniprocessor, (ii) two conventional shared-memory multiprocessors, and (iii) the first installed Tera MTA (at the San Diego Supercomputer Center). On these platforms, we compare the effectiveness of both automatic and manual parallelization.

关键词： terrain masking threat analysis multithreaded programming multiprocessor performance evaluation shared-memory multiprocessors HP exemplar lightweight threads parallel programming automatic parallelizing compilers C3I parallel benchmark suite tera MTA multithreaded architectures fine-grained synchronization Intel Pentium Pro digital alpha

来源：评论

学校读者我要写书评

暂无评论

Parallelization of a bound-consistency enforcing procedure and its application in solving nonlinear systems

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2017年 107卷 57-66页

作者： Kubica, Bartlomiej Jacek Warsaw Univ Life Sci Fac Appl Informat & Math Dept Appl Informat Nowoursynowska 159 PL-02776 Warsaw Poland

This paper considers incorporating a bound-consistency enforcing procedure to an interval branch-and-prune method. A heuristic to decide, when to use the developed operator, is proposed. As enforcing the bound-consistency is much more time consuming than performing other narrowing tools, we parallelize the procedure, using Intel TBB. A few parallelization versions are considered. Also, this is a good opportunity to make a case-study of performance of various lock instances, implemented in the TBB package. Numerical results for typical benchmark problems are presented and analyzed. A specific lock version, proper for the application, is proposed. Performance on two architectures is considered: Intel Xeon and Intel Xeon Phi (MIC). (C) 2017 Elsevier Inc. All rights reserved.

关键词： Nonlinear equations systems Interval computations Bound-consistency multithreaded programming TBB Readers-writer lock Big reader lock MIC

来源：评论

学校读者我要写书评

暂无评论

A continuation-based noninterruptible multithreading processor architecture

引用

JOURNAL OF SUPERCOMPUTING 2009年第2期47卷 228-252页

作者： Amamiya, Satoshi Amamiya, Makoto Hasegawa, Ryuzo Fujita, Hiroshi Kyushu Univ Dept Intelligent Syst Nishui Ku Fukuoka 8190395 Japan

Current trend of research on multithreading processors is toward the chip multithreading (CMT), which exploits thread level parallelism (TLP) and improves performance of softwares built on traditional threading components, e.g., Pthread. There exist commercially available processors that support simultaneous multithreading (SMT) on multicore processors. But they are basically based on the conventional sequential execution model, and execute multiple threads in parallel under the control of OS that handles interruptions. Moreover, there exist few languages or programming techniques to utilize the multicore processors effectively. We are taking another approach to develop a multithreading processor, which is dedicated to TLP. Our processor, named Fuce, is based on the continuation-based multithreading. A thread is defined as a block of sequentially ordered instructions which are executed without interruption. Every thread execution is triggered only by the event called continuation. This paper first introduces the continuation-based multithread execution model and its processor architecture then gives multithreaded programming techniques and the continuation-based multithreading language system CML. Last, the performance of the Fuce processor is evaluated by means of the clock-level software simulation.

关键词： Multithreading Parallel processing Thread level parallelism multithreaded programming Processor architecture

来源：评论

学校读者我要写书评

暂无评论

Thread-safety in an MPI implementation: Requirements and analysis

引用

PARALLEL COMPUTING 2007年第9期33卷 595-604页

作者： Gropp, William Thakur, Rajeev Argonne Natl Lab Div Math & Comp Sci Argonne IL 60439 USA

The MPI-2 Standard has carefully specified the interaction between MPI and user-created threads. The goal of this specification is to allow users to write multithreaded MPI programs while also allowing MPI implementations to deliver high performance. However, a simple reading of the thread-safety specification does not reveal what its implications are for an implementation and what implementers must be aware (and careful) of. In this paper, we describe and analyze what the MPI Standard says about thread-safety and what it implies for an implementation. We classify the MPI functions based on their thread-safety requirements and discuss several issues to consider when implementing thread-safety in MPI. We use the example of generating new context ids (required for creating new communicators) to demonstrate how a simple solution for the single-threaded case does not naturally extend to the multithreaded case and how a naive thread-safe algorithm can be expensive. We then present an algorithm for generating context ids that works efficiently in both single-threaded and multithreaded cases. (C) 2007 Elsevier B.V. All rights reserved.

关键词： message-passing interface (MPI) thread-safety MPI implementation multithreaded programming

来源：评论

学校读者我要写书评

暂无评论

A 250-MHz single-chip multiprocessor for audio and video signal processing

引用

IEEE JOURNAL OF SOLID-STATE CIRCUITS 2001年第11期36卷 1768-1774页

作者： Koyama, T Inoue, K Hanaki, H Yasue, M Iwata, E Sony Corp S&S Architecture Ctr Tokyo 1410032 Japan Sony Comp Sci Labs Inc Tokyo 1410022 Japan

A 250-MHz single-chip multiprocessor, which can implement multichannel decoding, encoding, and transcoding of various audio and video standards, was fabricated using 0.25-mum CMOS technology and consumes 2.38 W at 2.5 V. The multiprocessor integrates four processors and 64-kB shared level-2 cache and exploits coarse-grained parallelism inherent in audio and video signal processing with multithreaded programming. Three coprocessors and scratch-pad memory have been added to each processing element and perform subword parallel processing, background data transfer, and bitstream processing for audio and video signal processing. Useful-skew and clock gating have been utilized to achieve high-speed operation and low power consumption. Consequently, the multiprocessor achieves MPEG2 (MP@HL,) video decoding at 20 frames/s.

关键词： bitstream processing clock gating coarse-grained parallelism fine-grained parallelism multiprocessor multithreaded programming signal processing subword parallel processing useful skew

来源：评论

学校读者我要写书评

暂无评论

LB4OMP: A Dynamic Load Balancing Library for multithreaded Applications

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2022年第4期33卷 830-841页

作者： Korndorfer, Jonas H. Muller Eleliemy, Ahmed Mohammed, Ali Ciorba, Florina M. Univ Basel Dept Math & Comp Sci CH-4051 Basel Switzerland HPEs HPC AI EMEA Res Lab ERL Basel Switzerland

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.

关键词： Dynamic scheduling Processor scheduling Parallel processing Optimal scheduling Libraries Standards Load management Hierarchical parallelism dynamic load balancing self-scheduling runtime library OpenMP multithreaded programming shared-memory systems

来源：评论

学校读者我要写书评

暂无评论

Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture

引用

JOURNAL OF SUPERCOMPUTING 2013年第1期66卷 339-363页

作者： Mahafzah, Basel A. Univ Jordan King Abdullah II Sch Informat Technol Amman 11942 Jordan

Sorting huge amounts of datasets have become essential in many computer applications, such as search engines, database and web-based applications, in order to improve searching performance. Moreover, due to the witnessed prevalence of the commercial Simultaneous multithreaded architecture (SMT), parallel programming using multithreading becomes a dire need for efficiently using all available hardware resources for one application. In this paper, one of the efficient and quick algorithms, the Quicksort, is applied as a parallel multithreaded algorithm on SMT architecture, where virtual parallelization has been achieved using the POSIX threads (Pthreads) library. The proposed algorithm is evaluated and compared with its sequential counterpart. The obtained analytical and experimental results reveal that multithreading is a viable technique for implementing the parallel Quicksort algorithm efficiently on SMT architecture, where it has been shown both analytically and experimentally that the parallel multithreaded Quicksort algorithm outperforms the sequential Quicksort algorithm in terms of various performance metrics including;time complexity and speedup.

关键词： Quicksort Sorting Performance evaluation multithreaded programming Simultaneous multithreaded architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：