The purpose of this research was to construct an adaptive test on the computer. Adaptive testing is a new strategy of evaluation for computer-assisted learning and e-learning. Adaptive testing provides more efficient ...
详细信息
The purpose of this research was to construct an adaptive test on the computer. Adaptive testing is a new strategy of evaluation for computer-assisted learning and e-learning. Adaptive testing provides more efficient test administration and intelligent learning evaluation. It is expected to increase the accuracy of estimating the learners true ability with taking less appropriate selecting questions for individuals. Item response theory (IRT) is the main theoretical base to make tests adaptive and feasible. Adaptive testing requires high speed calculation to process the complicated IRT functions, which is fortunately the advantage of computers.
A panel session organized to project what changes might occur in the near future to make parallel computers easier to program and use and to explore how such computers could benefit many application areas is reported....
详细信息
A panel session organized to project what changes might occur in the near future to make parallel computers easier to program and use and to explore how such computers could benefit many application areas is reported. The following questions are discussed: (1) what type of applications will benefit on a widespread basis from improved performance; (1) whether serial and parallel programming will be integrated for greater reusability and portability; and (3) whether parallel computers will replace serial computers and, if so, what is needed so that concurrency can be handled more easily.< >
Exploiting clusters of workstations as a single computational resource is an attractive alternative to conventional multiprocessor technologies. However, the class of parallel applications that can benefit from cluste...
详细信息
Exploiting clusters of workstations as a single computational resource is an attractive alternative to conventional multiprocessor technologies. However, the class of parallel applications that can benefit from clusters is restricted due to their relatively high latency and low throughput-consequences of conventional networking. LANs offer the best performance but also limit the scope for effective clustering to a single room or building. Another major difference remains: multiprocessors can reasonably be programmed with the "error-free" assumption but applications cannot be run on distributed clusters without programming against the potential for remote faults. Emergent high speed switched networks such as ATM have the potential to reduce latency and increase bandwidth in the distributed scenario, and therefore extend the class of applications suitable for running on clusters. In addition, the virtual network capability of ATM removes some of the geographical constraints from clustering. But can ATM guarantee the type of application-level connection reliability which is taken for granted in multiprocessor environments? This paper reviews the capabilities of modern high-speed networks as exemplified by ATM and their relevance to parallel and distributed systems. In particular it asks if Quality of Service (QoS) can benefit parallel programming on distributed platforms.
Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance req...
详细信息
Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance requirements and resource budgets of embedded systems ask for an efficient design space exploration (DSE) approach to select the best design from a design space consisting of a large number of design choices. However, existing DSE approaches explore the design space that includes only architecture and mapping alternatives for an initial application specification given by the application designer. In this article, we first show that a design often might not be optimal if alternative specifications of a given application are not taken into account. We further argue that the best alternative specification consists of only independent and load-balanced application tasks. Based on the Polyhedral Process Network (PPN) MoC, we present an approach to analyze and transform an initial PPN to an alternative one that contains only independent processes if possible. Finally, by prototyping real-life applications on both FPGA-based MPSoCs and desktop multi-core platforms, we demonstrate that mapping the alternative application specification results in a large performance gain compared to those approaches, in which alternative application specifications are not taken into account.
Exploiting thread-level parallelism (TLP) is a promising way to improve the performance of applications with the advent of general-purpose cost effective uni-processor and shared-memory multiprocessor systems. In this...
详细信息
Exploiting thread-level parallelism (TLP) is a promising way to improve the performance of applications with the advent of general-purpose cost effective uni-processor and shared-memory multiprocessor systems. In this paper, we describe the OpenMP implementation in the Intel/spl reg/ C++ and Fortran compilers for Intel platforms. We present our major design consideration and decisions in the Intel compiler for generating efficient multithreaded codes guided by OpenMP directives and pragmas. We describe several transformation phases in the compiler for the OpenMP parallelization. In addition to compiler support, the OpenMP runtime library is a critical part of the Intel compiler. We present runtime techniques developed in the Intel OpenMP runtime library for exploiting thread-level parallelism as well as integrating the OpenMP support with other forms of threading termed as sibling parallelism. The performance results of a set of benchmarks show good speedups over the well-optimized serial code performance on Intel/spl reg/ Pentium- and Itanium-processor based systems.
Data parallelism is a powerful approach to parallel computation, particularly when it is used with complex data types. Categorical data types are extensions of abstract data types that structure computations in a way ...
详细信息
Data parallelism is a powerful approach to parallel computation, particularly when it is used with complex data types. Categorical data types are extensions of abstract data types that structure computations in a way that is useful for parallel implementation. In particular, they decompose the search for good algorithms on a data type into subproblems, all homomorphisms can be implemented by a single recursive, and often parallel, schema, and they are equipped with an equational system that can be used for software development by transformation.< >
This paper presents reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. We describe techniques to recognize a broad range of implicit reduction operations, including t...
详细信息
This paper presents reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. We describe techniques to recognize a broad range of implicit reduction operations, including those involving statements at multiple loop nesting levels and intermixed with conditional control flow. We introduce two new optimizations: factoring which increases data locality for SUM and PRODUCT reductions, and index encoding which enables a single global communication to accomplish both an extreme value reduction and an extreme value location reduction. We have implemented these techniques in the dHPF compiler for High Performance Fortran (HPF). We evaluate their effectiveness experimentally by compiling several reduction benchmarks with dHPF and two commercial HPF compilers, and comparing the performance of the generated code on an IBM SP2. Our results show that our recognition techniques are more powerful and that our index encoding and factoring optimizations can improve performance by a factor of two where they apply.
MICA (Mapped Interconnection-Cached Architecture) is a novel architecture combining large reconfigurable networks and small, fast on-line routing, crossbar switches. It offers a good match for parallel applications ex...
详细信息
MICA (Mapped Interconnection-Cached Architecture) is a novel architecture combining large reconfigurable networks and small, fast on-line routing, crossbar switches. It offers a good match for parallel applications exhibiting switching locality. Switching locality means that the need to "switch" or route the information to or from each PE is limited to a small set of sources or destinations. A parallel programming paradigm to attempt and minimize the movement of information by reconfiguring the relative proximity of the PEs is introduced. We aim to complete most communication requests with only two levels of routing decisions among a small set of channels. Multi-hop routing is not used as often, resulting in better performance.< >
Due to the attractive properties of the wavelet transform, wavelet filter banks are frequently used in areas such as signal processing and communication systems. Furthermore, the increasing computational power of micr...
详细信息
Due to the attractive properties of the wavelet transform, wavelet filter banks are frequently used in areas such as signal processing and communication systems. Furthermore, the increasing computational power of microprocessors leads to a leap in the use of techniques such as parallel processing, concurrent programming, and VHDL design. However, the inherently sequential tree structure of the traditional wavelet theory does not merge efficiently with the aforementioned techniques. This work presents an algorithm to generate uniform and non-uniform filter banks in a parallel structure. This algorithm generalizes the a Trous and Mallat algorithms for parallelized filter bank design, which is efficient for parallel processing, concurrent programming, and VHDL design. The algorithm generates a set of parallelized perfect-reconstruction filter banks for an arbitrary number of end-nodes of a traditional tree structure. The algorithm encompasses both the decimated and the undecimated cases. Examples of image and speech signal applications are presented.
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchm...
详细信息
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical architectures. Using the NAS parallel benchmarks, we first illustrate the lack of portable performance on stateof- the-art scalable parallel systems despite the use of two portable programming models, MPI and OpenMP. Then we present a dynamic compilation and execution framework that provides the desired portability through the use of program slices. These slices are used to select the optimal program decomposition on each architecture. Currently, our framework uses a simple incremental algorithm, which effectively identifies single or multi-level program decompositions that maximize performance. This algorithm can be used as a rule of thumb for automatic multilevel parallelization. The effectiveness of the approach is demonstrated on the NAS benchmarks running on two architectural platforms.
暂无评论