Multimedia applications and embedded platforms are both becoming very complex in order to improve user experience. Thus, multimedia developers need high-level methods to automate time-consuming and error-prone tasks. ...
详细信息
Multimedia applications and embedded platforms are both becoming very complex in order to improve user experience. Thus, multimedia developers need high-level methods to automate time-consuming and error-prone tasks. Dynamic dataflow modeling is attractive to describe complex applications, such as video codecs, at a high level of abstraction. This paper presents a dataflow-based design approach to implement video codecs on embedded multi-core platforms. First, we introduce a custom architecture model to design low-power multi-core chips based on distributed memory and Transport-Triggered Architecture processor cores. Then, we describe software synthesis techniques to improve dynamic dataflow implementations. This methodology has been implemented into open-source tools and demonstrated on video decoders based on the MPEG-4 Visual standard and the new High Efficiency Video Coding standard. The simulations achieve real-time decoding (40FPS) of high definition (720P) MPEG-4 Visual video sequences on a custom multi-core platform clocked at 1Ghz, which is an improvement of more than 100 % over previously proposed implementations.
programming models which specify an application as a network of independent computational elements have emerged as a promising paradigm for programming streaming applications. The antagonism between expressivity and a...
详细信息
ISBN:
(纸本)9781467380799
programming models which specify an application as a network of independent computational elements have emerged as a promising paradigm for programming streaming applications. The antagonism between expressivity and analysability has led to a number of different such programming models, which provide different degrees of freedom to the programmer One example are Kahn process networks (KPNs), which, due to certain restrictions in communication, can guarantee determinacy (their results are independent of timing by construction). On the other hand, certain dataflow models, such as the CAL Actor Language, allow non-determinacy and thus higher expressivity, however at the price of static analysability and thus a potentially less efficient implementation. In many cases, however, non-determinacy is not required (or even not desired), and relying on KPN for the implementation seems advantageous. In this paper, we propose an algorithm for classifying data flow actors (i.e. computational elements) as KPN compatible or potentially not. For KPN compatible dataflow actors, we propose an automatic KPN translation method based on this algorithm In experiments, we show that more than 75% of all mature actors of a standard multimedia benchmark suite can be classified as KPN compatible and that their execution time can be reduced by up to 1.97x using our proposed translation technique. Finally, in a manual classification effort, we validate these results and list different classes of KPN incompatibility.
Big Data Analytics in particular and Data Science in general have become key disciplines in the last decade. The convergence of Information Technology, Statistics and Mathematics, to explore and extract information fr...
详细信息
ISBN:
(纸本)9781467370257
Big Data Analytics in particular and Data Science in general have become key disciplines in the last decade. The convergence of Information Technology, Statistics and Mathematics, to explore and extract information from Big Data have challenged the way many industries used to operate, shifting the decision making process in many organizations. A new breed of Big Data platforms has appeared, to fulfill the needs to process data that is large, complex, variable and rapidly generated. The author describes the experience in this field from a company that provides Big Data analytics as its core business.
We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a stati...
详细信息
We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the Open-Stream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63x compared to a state-of-the-art work-stealing scheduler.
Many varied domain experts use LabVIEW as a graphical system design tool to implement DSP algorithms on myriad target architectures. In this paper, we introduce the latest LabVIEW FPGA compiler that enables domain exp...
详细信息
ISBN:
(纸本)9781479970889
Many varied domain experts use LabVIEW as a graphical system design tool to implement DSP algorithms on myriad target architectures. In this paper, we introduce the latest LabVIEW FPGA compiler that enables domain experts with minimum hardware knowledge to quickly implement, deploy, and verify their domain-specific applications on FPGA hardware. We present two compiler techniques that we use to 1) extract extra parallelism from a user's application to take advantage of the parallel hardware resources of the FPGA and 2) minimize memory-access traffic, which is often a bottleneck that restricts overall FPGA performance. Finally, our approach provides the user a simple constraint-driven experience to maximize their development efficiency. We use two case studies in two different domains, a 3GPP Turbo decoder and a Smith-Waterman algorithm, to show the benefits our tool provides to users.
Modern embedded systems show a clear trend towards the use of Multiprocessor System-on-Chip (MPSoC) architectures in order to handle the performance and power consumption constraints. However, the design and validatio...
详细信息
Modern embedded systems show a clear trend towards the use of Multiprocessor System-on-Chip (MPSoC) architectures in order to handle the performance and power consumption constraints. However, the design and validation of dedicated MPSoCs is an extremely hard and expensive task due to their complexity. Thus, the development of automated design processes is of highest importance to satisfy the time-to-market pressure of embedded systems. This paper proposes an automated co-design flow based on the high-level language-based approach of the Reconfigurable Video Coding framework. The designer provides the application description in the RVC-CAL dataflow language, after which the presented co-design flow automatically generates a network of heterogeneous processors that can be synthesized on FPGA chips. The synthesized processors are Very Long Instruction Word-style processors. Such a methodology permits the rapid design of a many-core signal processing system which can take advantage of all levels of parallelism. The toolchain functionality has been demonstrated by synthesizing an MPEG-4 Simple Profile video decoder to two different FPGA boards. The decoder is realized into 18 processors that decode QCIF resolution video at 45 frames per second on a 50 MHz FPGA clock frequency. The results show that the given application can take advantage of every level of parallelism. (C) 2013 Elsevier B.V. All rights reserved.
This paper demonstrates that it is possible to produce automatic, reconfigurable, and portable implementations of multimedia decoders onto platforms with the help of the MPEG Reconfigurable Video Coding (RVC) standard...
详细信息
This paper demonstrates that it is possible to produce automatic, reconfigurable, and portable implementations of multimedia decoders onto platforms with the help of the MPEG Reconfigurable Video Coding (RVC) standard. MPEG RVC is a new formalism standardized by the MPEG consortium used to specify multimedia decoders. It produces visual representations of decoder reference software, with the help of graphs that connect several coding tools from MPEG standards. The approach developed in this paper draws on dataflow Process Networks to produce a Minimal and Canonical Representation (MCR) of MPEG RVC specifications. The MCR makes it possible to form automatic and reconfigurable implementations of decoders which can match any actual platforms. The contribution is demonstrated on one case study where a generic decoder needs to process a multimedia content with the help of the RVC specification of the decoder required to process it. The overall approach is tested on two decoders from MPEG, namely MPEG-4 part 2 Simple Profile and MPEG-4 part 10 Constrained Baseline Profile. The results validate the following benefits on the MCR of decoders: compact representation, low overhead induced by its compilation, reconfiguration and multi-core abilities. (C) 2013 Elsevier B.V. All rights reserved.
The recent MPEG Reconfigurable Media Coding (RMC) standard aims at defining media processing specifications (e.g. video codecs) in a form that abstracts from the implementation platform, but at the same time is an app...
详细信息
The recent MPEG Reconfigurable Media Coding (RMC) standard aims at defining media processing specifications (e.g. video codecs) in a form that abstracts from the implementation platform, but at the same time is an appropriate starting point for implementation on specific targets. To this end, the RMC framework has standardized both an asynchronous dataflow model of computation and an associated specification language. Either are providing the formalism and the theoretical foundation for multimedia specifications. Even though these specifications are abstract and platform-independent the new approach of developing implementations from such initial specifications presents obvious advantages over the approaches based on classical sequential specifications. The advantages appear particularly appealing when targeting the current and emerging homogeneous and heterogeneous manycore or multicore processing platforms. These highly parallel computing machines are gradually replacing single-core processors, particularly when the system design aims at reducing power dissipation or at increasing throughput However, a straightforward mapping of an abstract dataflow specification onto a concurrent and heterogeneous platform does often not produce an efficient result Before an abstract specification can be translated into an efficient implementation in software and hardware, the dataflow networks need to be partitioned and then mapped to individual processing elements. Moreover, system performance requirements need to be accounted for in the design optimization process. This paper discusses the state of the art of the combinatorial problems that need to be faced at this design space exploration step. Some recent developments and experimental results for image and video coding applications are illustrated. Both well-known and novel heuristics for problems such as mapping, scheduling and buffer minimization are investigated in the specific context of exploring the design space of dat
We study various operations for splitting, partitioning, projecting and merging streams of data. These operations are motivated by their use in dataflow programming and stream processing languages. We use the framewor...
详细信息
We study various operations for splitting, partitioning, projecting and merging streams of data. These operations are motivated by their use in dataflow programming and stream processing languages. We use the framework of stream calculus and stream circuits for defining and proving properties of such operations using behavioural differential equations and coinduction proof principles. As a featured example we give proofs of results, observed by Moessner, from elementary number theory using our framework. We study the invariance of certain well patterned classes of streams, namely rational and algebraic streams, under splitting and merging. Finally we show that stream circuits extended with gates for dyadic split and merge are expressive enough to realise some non-rational algebraic streams, thereby going beyond ordinary stream circuits. (C) 2013 Published by Elsevier B.V.
In a live programming environment, the state of the running program is available during the editing process. An ideal live programming system should be able to harness the live program to offer improved abilities for ...
详细信息
ISBN:
(纸本)9781467362658
In a live programming environment, the state of the running program is available during the editing process. An ideal live programming system should be able to harness the live program to offer improved abilities for code creation and manipulation. We introduce Circa, a language and platform designed to address this need. We argue in favor of a dataflow-based model of computation, and we show how this format enables useful methods of code inspection and manipulation. We present a framework based on the backpropogation algorithm that allows the user to manipulate their program by expressing a desire against the program's result. We discuss how these code editing abilities can combine to produce a highly effective environment.
暂无评论