this paper addresses design methods for SoC-based HW/SW systems using reconfigurable architectures. the emphasis is the development of a method to enhance the reusability of HW and SW in the co-design process using pr...
详细信息
ISBN:
(纸本)0769517919
this paper addresses design methods for SoC-based HW/SW systems using reconfigurable architectures. the emphasis is the development of a method to enhance the reusability of HW and SW in the co-design process using proven languages like ANSI-C and VHDL. We distinguish between three abstraction layers for design modules consisting of both HW and SW this approach benefits the reuse of HW sources as well as SW sources for different applications as well as on different devices. We utilize the reconfigurable SoC Atmel FPSLIC for experimental tests and obtain a significant reuse ratio.
this article deals with one of the major problems in component-based software development: the derivation of properties of a component system from given properties of components and rules for their interaction. Wherea...
详细信息
ISBN:
(纸本)0769517870
this article deals with one of the major problems in component-based software development: the derivation of properties of a component system from given properties of components and rules for their interaction. Whereas properties of components can be analyzed and described by a software engineer responsible for component construction, many properties of an overall component system cannot be guaranteed in advance due to the late composition of components, for example by application engineers working for third parties. In the area of measurement, signal processing, and control in embedded systems, components encapsulating signal processing algorithms or signal adaptation algorithms can be modeled by means of a hierarchy of dataflow languages: synchronous (SDF), boolean controlled (BDF), and dynamic (DDF) dataflow. If the application engineer responsible for component assembly restricts to the use of SDF components, the component system will be computationally analyzable. that is, it can be decided if it is deadlock-free, the required amount of memory can be determined, and a cyclic schedule of component instances can be computed. If the application engineer uses SDF and BDF components only, the component system will still be deterministic. the objective of this paper is to describe a novel concept of a component framework for the afore-mentioned application area which can automatically determine certain global properties of a component system during component assembly, whenever possible. the application engineer benefits from the integrated techniques without having detailed knowledge of the underlying theory.
Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. this unprecedented degree of parallelism ...
详细信息
ISBN:
(纸本)9781581133462
Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. this unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where scalability is the ability of a parallel algorithm on a parallel architecture to effectively utilize an increasing number of processors. Users will need precise and automated techniques for detecting the cause of limited scalability. this paper addresses this dilemma. First, we argue that users face numerous challenges in understanding application scalability: managing substantial amounts of experiment data, extracting useful trends from this data, and reconciling performance information withtheir application's design. Second, we propose a solution to automate this data analysis problem by applying fundamental statistical techniques to scalability experiment data. Finally, we evaluate our operational prototype on several applications, and show that statistical techniques offer an effective strategy for assessing application scalability. In particular, we find that non-parametric correlation of the number of tasks to the ratio of the time for communication operations to overall communication time provides a reliable measure for identifying communication operations that scale poorly.
Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a general-purpose computing paradigm. Previous approaches include hardware and software techniques, both of which have dra...
详细信息
Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a general-purpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas software techniques require multiple executables. this paper presents a technique called dynamic rescheduling that applies software techniques dynamically, using intervention by the operating system. Results are presented to demonstrate the viability of the technique using the Illinois IMPACT compiler and the TINKER architectural framework.
A practical implementation of high performance instruction level parallel architectures is constrained by the difficulty to build a large monolithic multi-ported register file (RF). A solution is to partition the RF i...
详细信息
A practical implementation of high performance instruction level parallel architectures is constrained by the difficulty to build a large monolithic multi-ported register file (RF). A solution is to partition the RF into smaller RFs while keeping the total number of registers and ports equal. this paper applies RF partitioning to transport triggered architectures (TTAs); these architectures are of the VLIW type. One may expect that partitioning increases the number of executed cycles because it constrains the number of ports per RF. It is shown that these performance losses are small; e.g. partitioning an RF with 24 registers and four read and four write ports into four RFs with 6 registers and one read and one write port gives a performance loss of only 5.8%. Partitioned RFs consume less area than monolithic RFs withthe same number of ports and registers. Experiments show that, if the area saved by partitioning is spent on extra registers, partitioning does, on average, not reduce the performance; it may even result in a small performance gain.
Multithreaded execution models attempt to combine some aspects of dataflow-like execution with von Neumann model execution. their main objective is to mask the latency of inter-processor communications and remote memo...
详细信息
Multithreaded execution models attempt to combine some aspects of dataflow-like execution with von Neumann model execution. their main objective is to mask the latency of inter-processor communications and remote memory accesses in large scale multiprocessors. An important issue in the analysis and evaluation of multithreaded execution is the design and performance of the storage hierarchy. Because of the sequential execution of threads, the locality of access within an executing thread can be exploited using registers and cache. At the inter-thread level, however, the locality of accesses to memory and its effect on the cache is not yet well understood. A storage model which can exploit this locality is developed and evaluated. the results indicate there is a large amount of inter-thread locality that can be exploited and that we can get an efficient storage system by exploiting the characteristics of nonblocking threads.
the following topics are dealt with: algorithms;robots;automata;computation theory;parallel computation;graph theory;distributed architectures;and cryptography. the proceedings contains 32 papers.
ISBN:
(纸本)0818608072
the following topics are dealt with: algorithms;robots;automata;computation theory;parallel computation;graph theory;distributed architectures;and cryptography. the proceedings contains 32 papers.
Rule-based systems appear to be capable of exploiting large amounts of parallelism, because it is possible to match each rule to the data memory in parallel. It is pointed out that in practice the speedup from paralle...
详细信息
ISBN:
(纸本)081860719X
Rule-based systems appear to be capable of exploiting large amounts of parallelism, because it is possible to match each rule to the data memory in parallel. It is pointed out that in practice the speedup from parallelism is quite limited, less than 10-fold. the reasons for the small speedup are: (1) the small number of rules relevant to each change to data memory;(2) the large variation in the processing required by the relevant rules;and (3) the small number of changes made to data memory between synchronization steps. To obtain this limited factor of 10-fold speedup, it is necessary to exploit parallelism at a very fine granularity. It is suggested that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors. Using such a multiprocessor (with individual processors working at 2 MIPS), it is possible to obtain execution speeds of about 3800 rule-firings/s. this speed is significantly higher than that obtained by other proposed parallel implementations of rule-based systems.
暂无评论