The affective content of a video is defined as the expected amount and type of emotion that are contained in a video. Utilizing this affective content will extend the current scope of application possibilities. The di...
详细信息
ISBN:
(纸本)9781424403127
The affective content of a video is defined as the expected amount and type of emotion that are contained in a video. Utilizing this affective content will extend the current scope of application possibilities. The dimensional approach to representing emotion can play an important role in the development of an affective video content analyzer. The three basic affect dimensions are defined as valence, arousal and control [1]. This paper presents a novel FPGA-based system for modeling the arousal content of a video based on user saliency and film grammar. The design is implemented on a Xilinx Virtex-II xc2v6000 on board a RC300 board and it runs 25 times faster than a Pentium 4-based PC at 3.4 Ghz.
Rapid increases in transistor density, clock speeds and competition with custom ICs have escalated the demand for aggressive solutions to battle rising operating temperatures in programmable fabrics. In this work, we ...
详细信息
ISBN:
(纸本)9781424419609
Rapid increases in transistor density, clock speeds and competition with custom ICs have escalated the demand for aggressive solutions to battle rising operating temperatures in programmable fabrics. In this work, we make several key contributions to temperature management in FPGAs. We develop a novel and robust simulation framework exploring adaptive techniques to reduce on chip temperatures in the reconfigurable core. We implement a thermal driven voltage scaling algorithm based on temperature and performance feedback. Our performance estimation model is an accurate empirical relation between delay, supply voltage and temperature with an average error of 9%. Our final results show significant temperature reductions of upto 13.37 degrees C accompanied by the added benefit of power savings averaging 13.48%. Overheads are limited to an average reduction in worst case operating frequency of 10.78% and a voltage swing of 0.61V.
This paper presents Archlog, a language and framework for designing multiprocessor architectures in the logic programming domain. Our goal is to enable application developers in areas such as machine learning and cogn...
详细信息
ISBN:
(纸本)9781424403127
This paper presents Archlog, a language and framework for designing multiprocessor architectures in the logic programming domain. Our goal is to enable application developers in areas such as machine learning and cognitive robotics to produce high-performance designs for reconfigurable devices, without detailed knowledge of hardware development. The Archlog framework provides a high level of abstraction, enabling rapid system generation while supporting high performance. In this paper we present the Archlog language and its library-based compilation framework, which makes use of a customisable logic programming processor. The system generates multiple designs, with different trade-offs in the use of reconfigurable logic and embedded memories. An implementation of a multiprocessor for the machine learning system Progol on a 40MHz XC2V6000 FPGA is 10 times faster than a 2GHz Pentium 4 processor.
fieldprogrammable Gate Arrays (FPGAs) promise a low power flexible alternative for implementing parallel applications. Compared to CPUs and GPUs, they suffer from slow development cycles due to the high complexity of...
详细信息
ISBN:
(纸本)9781479900046
fieldprogrammable Gate Arrays (FPGAs) promise a low power flexible alternative for implementing parallel applications. Compared to CPUs and GPUs, they suffer from slow development cycles due to the high complexity of application development and hardware incompatibilities. Towards this direction, we propose a platform-independent methodology and the supporting framework targeting efficient run-time application mapping onto FPGAs. Experimental results show that the introduced solution performs application placement and routing of multiple applications without any performance penalty as compared to state of art tools. Scalability of the framework was verified by mapping up to 73 applications per minute when It is executed on an 8 core system.
The growth of sensor technology, communication systems and computation have led to vast quantities of data being available for relevant parties to utilise. applications such as the monitoring and analysis of industria...
详细信息
ISBN:
(纸本)9789090304281
The growth of sensor technology, communication systems and computation have led to vast quantities of data being available for relevant parties to utilise. applications such as the monitoring and analysis of industrial equipment, smart surveillance, and fraud detection rely on the ‘real-time’ analysis of time sensitive data gathered from distributed sources. A variety of processing tasks, such as filtering, aggregation, machine learning algorithms, or other transformations to be carried out on this data in order to extract value from it. Centralised computation strategies are often deployed in these scenarios, with the majority of the data being forwarded though the network to a datacenter environment, typically due to the lack of required computational or storage resources at the leaves of the network, and data from other sources or historical data being required. This approach has also traditionally been viewed as more scalable, as resources can be augmented through the addition of extra compute hardware and cloud services.
In this paper, we propose a first step towards a time predictable computer architecture for single-chip multiprocessing (CMP). CMP is the actual trend in server and desktop systems. CMP is even considered for embedded...
详细信息
ISBN:
(纸本)9781424410590
In this paper, we propose a first step towards a time predictable computer architecture for single-chip multiprocessing (CMP). CMP is the actual trend in server and desktop systems. CMP is even considered for embedded realtime systems, where worst-case execution time (WCET) estimates are of primary importance. We attack the problem of WCET analysis for several processing units accessing a shared resource (the main memory) by support from the hardware. In this paper, we combine a time predictable Java processor and a direct memory access (DMA) unit with a regular access pattern (VGA controller). We analyze and evaluate different arbitration schemes with respect to schedulability analysis and WCET analysis. We also implement the various combinations in an FPGA. An FPGA is the ideal platform to verify the different concepts and evaluate the results by running applications with industrial background in real hardware.
FPGAs have become complex, heterogeneous platforms targeting a multitude of different applications. Understanding how a design maps to them and consumes various FPGA resources can be difficult to predict, so typically...
详细信息
ISBN:
(纸本)9781424419609
FPGAs have become complex, heterogeneous platforms targeting a multitude of different applications. Understanding how a design maps to them and consumes various FPGA resources can be difficult to predict, so typically designers are forced to run full synthesis on each iteration of the design. For complex designs that involve many iterations and optimizations, the run-time of synthesis can be quite prohibitive. In this paper, we describe a fast and accurate method of estimating the FPGA resources of any RTL-based design. We achieve run-times that are more than 60 times faster than synthesis and is on average within 22% of the actual mapped slices across a large benchmark suite targeting three different FPGA families. This resource estimator tool is first provided in Xilinx. PlanAbead 10.1.
Capacity of FPGAs has grown significantly, leading to increased complexity of designs targeting these chips. Traditional FPGA design methodology using HDLs is no longer sufficient and new methodologies are being sough...
详细信息
ISBN:
(纸本)9781424438914
Capacity of FPGAs has grown significantly, leading to increased complexity of designs targeting these chips. Traditional FPGA design methodology using HDLs is no longer sufficient and new methodologies are being sought. An attractive possibility is to use streaming languages. Streaming languages group data into streams, which are processed by computational nodes called kernels. They are suitable for implementation in FPGAs because they expose parallelism, which can be exploited by implementing the application in FPGA logic. Designers can express their designs in a streaming language and target FPGAs without needing a detailed understanding of digital logic design. In this paper we show how the Brook streaming language can be used to simplify design for FPGAs, while providing reasonable performance compared to other methodologies. We show that throughput of streaming applications can be increased through automatic kernel replication. Using our compiler, the FPGA designer can trade off FPGA area and performance by changing the amount of kernel replication. We describe the details of our compiler and present performance and area of a set of benchmarks. We found that throughput scales well with increased replication for most applications.
The fast implementations of ECC in GF(p) are generally implemented using specialized prime field, and henceforth, they are dependent on the structure of the prime. But, these implementations cannot be ported to generi...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
The fast implementations of ECC in GF(p) are generally implemented using specialized prime field, and henceforth, they are dependent on the structure of the prime. But, these implementations cannot be ported to generic curves which do not support such prime structures. Such generic curves are often used in various crypto-applications like pairing and post quantum secure supersingular isogeny based key exchange. In those cases, modular multiplication is executed through Montgomery multiplier which is slower compared to modular multiplication using specialized primes. This work aims to reduce the speed gap between Montgomery multiplication and modular multiplication in specialized prime field by presenting an efficient implementation of Montgomery multiplier on FPGA using the redundant number system.
FPGAs take advantage of 2.5D stacking technology to manufacture large capacity and high performance heterogenous devices at reasonable costs. EDA tools need to be aware of and exploit physical characteristics of such ...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
FPGAs take advantage of 2.5D stacking technology to manufacture large capacity and high performance heterogenous devices at reasonable costs. EDA tools need to be aware of and exploit physical characteristics of such devices, for example the reduced connection count between SLRs, the infrequency of SLL channel occurence in the fabric, and the aspect ratios of individual SLRs. We implement a partition driven placer to explore various EDA options to take advantage of architectural features in 2.5D FPGAs. We improve the routability of designs by optimizing the placer for discrete SLL channels and reduced connection counts. We propose a cut schedule for the partitioner to orient the placement with awareness of the aspect ratio of SLRs to improve track demands within each SLR.
暂无评论