The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implem...
详细信息
ISBN:
(纸本)9781538604465
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implementations are obtained by developing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in resource utilization and increase in computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and performance requirements of the system architecture. The implementation solution presented in the paper can efficiently scale up to 250MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 MegaAlignments/min using 36 S-W cores on the same FPGA for sequences of 160 x 100 nucleotides.
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hy...
详细信息
ISBN:
(纸本)9781538604465
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), which is a recently-introduced software tool for implementing scalable workflows for high performance computing applications. HMBE integrates advanced design optimization techniques provided in HTGS with model-based approaches that are founded on dataflow principles. Such integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process. In this paper, we present HMBE with an emphasis on novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE through a case study involving an image stitching application for large scale microscopy images.
Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. In this work, benchmarks of DSP kernels and typical DSP applications ...
详细信息
ISBN:
(纸本)0780371453
Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. In this work, benchmarks of DSP kernels and typical DSP applications are used to compare commercially available DSPs in terms of cycle count, power, and compiler efficiency.
The increasing number of safety-critical commercial applications has generated a need for components with high levels of reliability. As CMOS process sizes continue to shrink, the reliability of ICs is negatively affe...
详细信息
ISBN:
(纸本)0780393333
The increasing number of safety-critical commercial applications has generated a need for components with high levels of reliability. As CMOS process sizes continue to shrink, the reliability of ICs is negatively affected since they become more sensitive to transient faults. New circuit designs must take this fact into consideration, and incorporate adequate protection against the effects of transient faults. This paper presents a novel method for protecting the pipelined execution unit of an embedded processor. It is based on a self-configured architecture with hybrid redundancy that can mask single and multiple errors, which can occur on storage elements due to transient or permanent faults. This concept can be easily applied to any processing architecture of this nature with a high safety integrity level. Results from error-injection experiments are also reported that show that this design can maintain a non-interrupted and failure-free operation under single and double errors with a probability that exceeds 99.4%.
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity i...
详细信息
ISBN:
(纸本)0780377958
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity is discussed. Further, sub-optimal receiver suitable for OFDM signals with large number of sub-carriers and high-order constellation is presented. The performance of optimal and sub-optimal receiver for nonlinearly amplified m-QAM-OFDM signal is studied by means of simulation.
This paper explains our experience in applying "higher" levels of abstraction for DSP based designs, Higher levels of abstraction allow us to describe digital radio designs in a more generic fashion, abstrac...
详细信息
ISBN:
(纸本)0780364880
This paper explains our experience in applying "higher" levels of abstraction for DSP based designs, Higher levels of abstraction allow us to describe digital radio designs in a more generic fashion, abstracted over carrier frequencies and data rates As such, specific instances of the radios can be compiled from our generic descriptions. In this way, we obtain rapid reconfiguration within a wide range of design specifications.
In order to cope with the increasing number of functions that need to be implemented on a single chip as telecommunication products become more complex, a rapid trend towards programmable architectures as a base for d...
详细信息
ISBN:
(纸本)0780338065
In order to cope with the increasing number of functions that need to be implemented on a single chip as telecommunication products become more complex, a rapid trend towards programmable architectures as a base for digital signalprocessing (DSP) systems is occurring. The reason for this is that extremely complex algorithms and protocols must be implemented to economically use the available bandwidth for the next generation of wireless networks. The rapidly changing system requirements and design productivity and the intellectual property reuse are also promoting this trend.
This paper presents a hardware/software partitioning methodology for improving performance in single-chip systems comprised by processor and reconfigurable logic. The reconfigurable logic is realized by Field Programm...
详细信息
ISBN:
(纸本)0780393333
This paper presents a hardware/software partitioning methodology for improving performance in single-chip systems comprised by processor and reconfigurable logic. The reconfigurable logic is realized by Field Programmable Gate Array technology. Critical software parts are selected for acceleration on the reconfigurable logic. A generic hybrid System-on-Chip platform, which can model the majority of existing processor-FPGA systems, is considered by the method. The partitioning method uses an automated kernel identification process at the basic-block level for detecting critical software portions. Three different instances of the generic platform and two sets of benchmarks are used in the experiments. The analysis on five real-life applications showed that these applications spend an average of 69% of their instruction count in 11% on average of their code. The extensive experimentation illustrates that for the systems composed by 32-bit processors the speedup of five applications ranges from 1.3 to 3.7 relative to an all software solution. For a platform composed by an 8-bit processor, the performance gains of eight DSP algorithms are considerably greater, since the average speedup equals 28.
In this paper we investigate efficiencies that may be Introduced into the fault-tolerant MRRNS system by restricting the data sample polynomials to be even. We refer to this as the Symmetrical MRRNS (SMRRNS) technique.
ISBN:
(纸本)0780371453
In this paper we investigate efficiencies that may be Introduced into the fault-tolerant MRRNS system by restricting the data sample polynomials to be even. We refer to this as the Symmetrical MRRNS (SMRRNS) technique.
The coarse-grained reconfigurable design paradigm provides, in a wide scope of design cases, effective support for adaptability, as required in modern embedded systems. The Reconfigurable Platform Composer Tool (RPCT)...
详细信息
ISBN:
(纸本)9781509033614
The coarse-grained reconfigurable design paradigm provides, in a wide scope of design cases, effective support for adaptability, as required in modern embedded systems. The Reconfigurable Platform Composer Tool (RPCT) project and its main outcome, the Multi-Dataflow Composer, aim at reducing the effort related with the design, mapping, optimization and prototyping of coarse-grained reconfigurable systems.
暂无评论