It took Transmeta engineers $100 million, five years of secret toil, and a little magic to create fast low-power chips that turn into x86s in a microsecond. Transmeta Corporation's Crusoe chips look nothing like I...
详细信息
It took Transmeta engineers $100 million, five years of secret toil, and a little magic to create fast low-power chips that turn into x86s in a microsecond. Transmeta Corporation's Crusoe chips look nothing like Intel's Pentium processors. They do not even have a logic gate in common. They are smaller, consume between one-third and one-thirtieth the power (depending on the application), and implement none of the same instructions in hardware. However the Crusoe microprocessors can run the same software that runs on IBM PC-compatible personal computers with Pentium chips-for instance, Microsoft Windows or versions of Unix, along with their software applications. The paper describes the development of the Crusoe chips.
Firmware engineering strategies range from hand coding to the use of optimizing compilers. For fine-tuned microcode we often need the advantages of both ends of the spectrum.
Firmware engineering strategies range from hand coding to the use of optimizing compilers. For fine-tuned microcode we often need the advantages of both ends of the spectrum.
NStrace is a bus-driven hardware trace facility developed for the PowerPC(R) family of superscalar RISC microprocessors. It uses a recording of activity on a target processor's bus to infer the sequence of instruc...
详细信息
NStrace is a bus-driven hardware trace facility developed for the PowerPC(R) family of superscalar RISC microprocessors. It uses a recording of activity on a target processor's bus to infer the sequence of instructions executed during that recording period. NStrace is distinguished from related approaches by its use of an architecture-level simulator to generate the instruction sequence from the bus recording. The generated trace represents the behavior of the processor as it executes at normal speed while interacting normally with its run-time environment. Furthermore, details of the processor state that are not generally available to other trace mechanisms can be provided by the architectural simulation. There are two main components to the process of generating bus-driven instruction traces: bus capture and trace generation. Bus capture is triggered by a call to a system program that puts a particular address on the bus, then establishes the initial state of the processor by a combination of writing out register values and invalidating caches. A logic analyzer records the bus activity, and from this a file of bus transactions is produced. Trace generation proceeds by driving a processor simulator with these bus transactions and recording the sequence of instructions that results. The processor simulator is an elaboration of that developed for the PowerPC Visual Simulator. We have successfully generated instruction traces for a mix of utility programs and real applications on several microprocessor platforms running several operating systems. The capacity of the bus recording hardware is two million transactions, yielding instruction traces with lengths of the order of one hundred million instructions. This trace facility has been used for a number of studies covering a range of performance issues involving software, hardware, and their interactions.
RISC-type designs are evaluated and compared with non-RISC instruction-set extensions using a level playing field with similar compiler strategies, without compatibility considerations, and with similar implementation...
详细信息
RISC-type designs are evaluated and compared with non-RISC instruction-set extensions using a level playing field with similar compiler strategies, without compatibility considerations, and with similar implementation constraints. instructionset evaluation is also considered. The data presented are based on five benchmark programs, which are also discussed.
The PowerPC is a new RISC architecture derived from IBM's POWER architecture. The changes made to POWER simplify implementations, increase clock rates, enable a higher degree of superscalar execution, extend the a...
详细信息
The PowerPC is a new RISC architecture derived from IBM's POWER architecture. The changes made to POWER simplify implementations, increase clock rates, enable a higher degree of superscalar execution, extend the architecture to 64 bits, and add multiprocessor support. For compatibility with existing software, the developers retained POWER's basic instructionset, opcode assignments, and programming model.
TWO MIPS CORES BASED ON THE MONTAGE ARCHITECTURE SUPPORT THE NEEDS OF EMBEDDED SYSTEMS ANO CONSUMER APPLIANCES. THE FLEXIBILITY OF THE ARCHITECTURE PERMITS EXTENSIONS TO BE RAPIDLY IMPLEMENTED TO MEET DEMANDING CUSTOM...
详细信息
TWO MIPS CORES BASED ON THE MONTAGE ARCHITECTURE SUPPORT THE NEEDS OF EMBEDDED SYSTEMS ANO CONSUMER APPLIANCES. THE FLEXIBILITY OF THE ARCHITECTURE PERMITS EXTENSIONS TO BE RAPIDLY IMPLEMENTED TO MEET DEMANDING CUSTOMER REQUIREMENTS.
A special multistack structure and optimization technique to partition, place, and wire the data-path macros in the form of the multistack structure are described, taking into account the connectivity of all the chip ...
详细信息
A special multistack structure and optimization technique to partition, place, and wire the data-path macros in the form of the multistack structure are described, taking into account the connectivity of all the chip logic (data path, control logic, chip drivers, on-chip memory). The overall objective is: to fit the circuits within the chip boundary; to ensure data-path internal wirability; as well as external stack wirability to the other circuits; and to minimize wire lengths for wirability and timing. A tool for automatic multistack optimization has been implemented and applied successfully to layout high-density data path chips.< >
A description is given of the R3010 floating-point accelerator chip, a coprocessor that is based on advanced reduced-instruction-set-computer (RISC) architecture and VLSI design techniques and provides high-speed floa...
详细信息
A description is given of the R3010 floating-point accelerator chip, a coprocessor that is based on advanced reduced-instruction-set-computer (RISC) architecture and VLSI design techniques and provides high-speed floating-point operation. The 75000-transistor hard-wired chip executes four instructions in parallel. Its performance is compared with that of available floating-point processors and its architecture is examined. The organization and implementation of the R3010 is discussed.
Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use. The arguments are restricted initially to single-level caches in uniprocessors. They are th...
详细信息
Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use. The arguments are restricted initially to single-level caches in uniprocessors. They are then extended to two-level cache hierarchies. How and when these arguments for caches in uniprocessors apply to caches in multiprocessors are also discussed.
A self-timed microarchitecture called KeyRing is presented, and a method for implementing KeyRing circuits compatible with a timing-driven electronic design automation (EDA) flow is discussed. The KeyRing microarchite...
详细信息
A self-timed microarchitecture called KeyRing is presented, and a method for implementing KeyRing circuits compatible with a timing-driven electronic design automation (EDA) flow is discussed. The KeyRing microarchitecture is derived from the AnARM, a low-power self-timed ARM processor based on ad hoc design principles. First, the unorthodox design style and circuit structures are revisited. A theoretical model that can support the design of generic circuits and the elaboration of EDA methods is then presented. Also addressed are the compatibility issues between KeyRing circuits and timing-driven EDA flows. The proposed method leverages relative timing constraints to translate the timing relations in a KeyRing circuit into a set of timing constraints that enable timing-driven synthesis and static timing analysis. Finally, two 32-bit RISC-V processors are presented;called KeyV and based on KeyRing microarchitectures, they are synthesized in a 65 nm technology using the proposed EDA flow. Postsynthesis results demonstrate the effectiveness of the design methodology and allow comparisons with a synchronous alternative called SynV. Performance and power consumption evaluations show that KeyV has a power efficiency that lies between SynV with clock-gating and SynV without clock-gating.
暂无评论