We have developed a 16-way multithreaded microprocessor called BlueSPARC. This in-order, high-throughput processor incorporates complex features such as privileged operations, memory management, and a non-blocking cac...
详细信息
Generic instruction based testing methods do not always give good fault coverage for the complex units like Forwarding unit. Hence it becomes important to carefully craft the test which are best for different parts of...
详细信息
An increasingly large portion of scheduler latency is derived from the monolithic content addressable memory (CAM) arrays accessed during instruction wakeup. The performance of the scheduler can be improved by decreas...
详细信息
An increasingly large portion of scheduler latency is derived from the monolithic content addressable memory (CAM) arrays accessed during instruction wakeup. The performance of the scheduler can be improved by decreasing the number of tag comparisons necessary to schedule instructions. Using detailed simulation-based analyses, we find that most instructions enter the window with at least one of their input operands already available. By putting these instructions into specialized windows with fewer tag comparators, load capacitance on the scheduler critical path can be reduced, with only very small effects on program throughput. For instructions with multiple unavailable operands, we introduce a last-tag speculation mechanism that eliminates all remaining tag comparators except those for the last arriving input operand. By combining these two tag-reduction schemes, we are able to construct dynamic schedulers with approximately one quarter of the tag comparators found in conventional designs. Conservative circuit-level timing analyses indicate that the optimized designs are 20-45% faster and require 10-20% less power, depending on instruction window size.
Along with the increasing diversity of educational markup languages there is a strong need for flexible integration of such content into learning platforms. Therefore, we have identified four basic mechanisms: plain l...
详细信息
Field programmable gate arrays (FPGAs) are widely used in building Systems-on-Programmable-Chips (SOPCs) since they contain plenty of reconfigurable heterogeneous resources providing the facility to implement various ...
详细信息
Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex bas...
详细信息
In many sports and medical applications small and wearable sensors are used that capture motion and physiological signals. Yet, commonly such sensor are only capable of acquiring raw data, which is afterwards transmit...
详细信息
ISBN:
(纸本)9791092279016
In many sports and medical applications small and wearable sensors are used that capture motion and physiological signals. Yet, commonly such sensor are only capable of acquiring raw data, which is afterwards transmitted to a smartphone or other mobile computing device where processing and classification is carried out. This results in limited usability because another device has to be worn. Moreover, high power consumption due to continuous transmission is a main disadvantage. Therefore, we propose a smart sensor approach that can alleviate these problems by carrying out the processing on the sensor itself. To house the processing schemes in the sensor node we developed a new computerarchitecture utilizing an FPGA and an ASIC. To show the benefits of the in-sensor processing, we chose two representative biosensor applications: a movement and fall detection system for the elderly and a swimming style recognition system for professional athletes. Compared to conventional approaches the same classification rate can be achieved while saving space, power, weight and setup costs.
This paper discusses a control mechanism for highly parallel computation and its implementation. The author presents an architecture featuring a hexagonal processor array and implements the dependence-driven computati...
详细信息
This paper discusses a control mechanism for highly parallel computation and its implementation. The author presents an architecture featuring a hexagonal processor array and implements the dependence-driven computation model. The machine implements the dependence graph connecting the single assignment block (SAB) nodes. The SAB is a function which assigns data to a group of variables only once. The dependence between SAB's is implemented by an activation signal and a dynamic memory. The SAB is activated by the activation signal. The data is sent through the dynamic memory. The SAB's are synchronized by the memory. Each SAB is explained by a data-flow graph. The graph is mapped directly onto the array of the processing element. Three types of parallelism are implemented: (i) Direct mapped data-flow of SAB, (ii) Pipeline between SAB's, (iii) Parallel processing of SAB's.
This paper proposes new network interface controller (NIC) designs that take advantage of integration with the host CPU to provide increased flexibility for operating system kernel-based performance *** believe that t...
详细信息
This paper presents a novel statistical model to estimate the reliability and number of errors of hardware tasks running on partially reconfigurable FPGAs in harsh environments. The proposed model has been validated b...
详细信息
暂无评论