Advances in FPGA technologies allow designing highly complex systems using on-chip FPGA resources and intellectual property (IP) cores. Furthermore, it is possible to build multiprocessor systems using hard-core or so...
详细信息
Advances in FPGA technologies allow designing highly complex systems using on-chip FPGA resources and intellectual property (IP) cores. Furthermore, it is possible to build multiprocessor systems using hard-core or soft-core processors, increasing the range of applications that can be implemented on an FPGA. In this paper we propose a symmetric multiprocessor architecture using the Microblace soft-core processor, and the operating system support needed for running multithreaded applications. Four systems with different shared memory configurations have been implemented on FPGA and tested with parallel applications to show its performance.
Driven by continuing scaling of Moore's law, chip multi-processors and systems-on-a-chip are expected to grow the core count from dozens today to hundreds in the near future. Scalability of on-chip interconnect to...
详细信息
ISBN:
(纸本)9781424429325
Driven by continuing scaling of Moore's law, chip multi-processors and systems-on-a-chip are expected to grow the core count from dozens today to hundreds in the near future. Scalability of on-chip interconnect topologies is critical to meeting these demands. In this work, we seek to develop a better understanding of how network topologies scale with regard to cost, performance, and energy considering the advantages and limitations afforded on a die. Our contributions are three-fold. First, we propose a new topology, called Multidrop Express Channels (MECS), that uses a one-to-many communication model enabling a high degree of connectivity in a bandwidth-efficient manner. In a 64-terminal network, MECS enjoys a 9% latency advantage over other topologies at low network loads, which extends to over 20% in a 256-terminal network. Second, we demonstrate that partitioning the available wires among multiple networks and channels enables new opportunities for trading-off performance, area, and energy-efficiency that depend on the partitioning scheme. Third, we introduce Generalized Express Cubes - a framework for expressing the space of on-chip interconnects - and demonstrate how existing and proposed topologies can be mapped to it.
Abnormalities in the oculomotor system are well known clinical symptoms in patients of several neurodegenerative diseases, including modifications in latency, peak velocity, and deviation in saccadic movements, causin...
详细信息
ISBN:
(纸本)9789896740023
Abnormalities in the oculomotor system are well known clinical symptoms in patients of several neurodegenerative diseases, including modifications in latency, peak velocity, and deviation in saccadic movements, causing changes in the waveform of the patient response. The changes in the morphology waveform suggest a higher degree of statistic independence in sick patients when compared to healthy individuals regarding the patient response to the visual saccadic stimulus modeled by means of digital generated saccade waveforms. The electro-oculogram records of six patients diagnosed with ataxia SCA2 (a neurodegenerative hereditary disease) and six healthy subjects used as control were processed to extract saccades. We propose the application of a blind source separation algorithm (or independent component analysis algorithm) in order to find significant differences in the obtained estimations between healthy and sick subjects. These results point out the validity of independent component analysis based techniques as an adequate tool in order to evaluate saccadic waveform changes in patients of ataxia SCA-2.
The user interface generation process is still a complex issue. The manual creation process is time consuming and complex because it requires the combination of the application developer work and the user interface de...
详细信息
The user interface generation process is still a complex issue. The manual creation process is time consuming and complex because it requires the combination of the application developer work and the user interface designer work. Therefore, many approaches investigate the automatic user interface generation. However, currently such approaches are not able to take usability aspects into account, due to the lack of information for such a process. Thus, we propose a tool which intends to ease and enhance the automatic user interface generation process for services by introducing service annotations that are attached to services and provide additional information for the process of automatic user interface generation.
With increasing defect density, microprocessors, especially the embedded caches, will encounter more faults. Adding spare resources to replace defective components is a widely accepted method for yield enhancement. In...
详细信息
In this paper, the texture feature "coarseness" is modelled by means of fuzzy sets, relating representative coarseness measures (our reference set) with the human perception of this type of feature. In our s...
详细信息
In this paper, the texture feature "coarseness" is modelled by means of fuzzy sets, relating representative coarseness measures (our reference set) with the human perception of this type of feature. In our study, a wide variety of measures have been analyzed, defining unidemsional and bidimensional fuzzy set for different combination of measures. The fineness human perception has been collected from polls filled by human subjects, performing an aggregation of their assessments by means of OWA operators. Using as reference set a combination of some measures, the membership function corresponding to the fuzzy set is modelled as the function which provides the best fit of the collected data.
To scale with the on-chip wire delay, the inside of a processor will further be partitioned into more small banks. As the amount of data cache bank scales, the load routing latency will contribute a considerable porti...
详细信息
To scale with the on-chip wire delay, the inside of a processor will further be partitioned into more small banks. As the amount of data cache bank scales, the load routing latency will contribute a considerable portion to program executing time. In a tiled processor like T-Flex, we observed that the average load routing latency (hit in data cache) can be largely reduced (by 72.1%) when data is perfectly placed in where the load issues. To reduce the long routing latency of critical loads, we give out a solution for localizing load execution at issuing side in this paper. However, this method will induce overhead of data copies and extra communications to maintain coherence and memory order. We explore the design space for localizing data access, with special respect to maximizing benefits at expense of relatively small overhead. We observed the access frequency and load store behaviors for different copies vary largely, with large amount of successive load accesses concentrating on small amount of data blocks. Our experiments show that with special replication and data copy invalidation strategies, copying overhead will be controlled while maintaining considerable performance profits.
This paper proposes an audio watermark embedding and extraction algorithm based on fast Fourier transform (FFT). This algorithm makes full use of human auditory system (HAS) characteristic, embeds watermarking informa...
详细信息
This paper proposes an audio watermark embedding and extraction algorithm based on fast Fourier transform (FFT). This algorithm makes full use of human auditory system (HAS) characteristic, embeds watermarking information into the phase coefficient of audio signal after FFT. The experimental results show that the audio watermarking generated with the proposed algorithm are inaudible and robust characteristics against noise and commonly used in audio processing techniques.
Inertial measurement units (IMUs) are gaining popularity in several application fields, such as navigation, body motion monitoring and indoors positioning. Microelectromechanical (MEMS) accelerometers and gyroscopes a...
详细信息
ISBN:
(纸本)9781424446995
Inertial measurement units (IMUs) are gaining popularity in several application fields, such as navigation, body motion monitoring and indoors positioning. Microelectromechanical (MEMS) accelerometers and gyroscopes are used for this purpose since they offer a reasonable price-performance trade-off. However, they still present several undesired characteristics in their output that should be compensated through proper device calibration. Although accelerometers are quite easy to calibrate, gyroscopes need more complex systems and equipment to achieve an accurate calibration. This paper shows a novel calibration system using simple equipment (a bike wheel as a turntable) designed for an IMU which is used for a knee telerehabilitation system and composed of a triaxial accelerometer and a biaxial gyroscope. MEMS Accelerometers are usually more accurate and offer better performance than MEMS gyroscopes. Thus, accelerometer data is used to help calibrate the gyroscope by applying a novel, simple, yet accurate set of maneuvers.
Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric crypt...
详细信息
Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric cryptograph on stream processor, we implement and analyze cryptograph algorithms on different stream processors in this paper. Four cipher algorithms including RC5, AES, TWOFISH and 3DES in ECB model are implemented on three platforms, which are stream processor SPI Storm SP16-G160, NVIDIA GeForce 9800GTX, Intel Core2 dual-core processor E7300. The difference of architecture between two stream processors and the character of programming model are described. When we compare throughput rate of these applications, 9800GTX is shown with 4-30x performance improvement over E7300, SP16 achieves the highest power efficiency and obtains 15-20x increase over E7300 in Gops/Watt.
暂无评论