检索结果-内蒙古大学图书馆

23rd International Symposium on Quality Electronic Design (ISQED)

作者： Vaeztourshizi, Marzieh Pedram, Massoud Univ Southern Calif Dept Elect Comp Engn Los Angeles CA 90007 USA

ISBN: (纸本)9781665494663

In this paper, we present an error estimation and propagation technique which targets high-level design abstraction through considering data-flow graph (DFG) representation of approximate circuits. The proposed technique is utilized for pruning different combinations of exact versus approximate realizations of various operations in the DFG for a design space exploration (DSE) framework The technique relies on the output error estimation of the arithmetic modules of a high-level library by dividing the input range to intervals and then considering different combinations of these intervals (cluster-based estimation of output error). Additionally, the estimation considers the error of the operands. The error for each combination is stored in a look up table (LUT). The efficacy of the proposed method is assessed for two image processing applications. Simulation results show that the framework can efficiently generate the Pareto frontier in the trade-off space of accuracy versus energy efficiency for the two applications.

关键词： Approximate Circuits Error Estimation Pareto Frontier data-flow graphs Look up Tables Error Metrics

来源：评论

学校读者我要写书评

暂无评论

GraphCL: A Framework for Execution of data-flow graphs on Multi-Device Platforms 30

GraphCL: A Framework for Execution of Data-Flow Graphs on Mu...

引用

30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

作者： Moren, Konrad Goehringer, Diana Fraunhofer IOSB Dept Object Recognit Ettlingen Germany Tech Univ Dresden Chair Adapt Dynam Syst Dresden Germany

ISBN: (纸本)9781665469586

This article introduces GraphCL, an automated system for seamlessly mapping multi-kernel applications to multiple computing devices. GraphCL, consists of a C ++ API and a runtime that abstracts and simplifies the execution of multi-kernel applications on heterogeneous platforms across multiple devices. The GraphCL approach has three steps. First, the application designer provides a kernel graph. In the second phase, GraphCL computes the execution schedule. After the schedule has been computed, the runtime uses the execution schedule to enqueue in parallel the processing for all system processors. GraphCL takes the kernel dependencies and the processor performance differences into account during the schedule calculation process. By deciding on the schedule, GraphCL transparently manages the order of execution and data transfers for each processor. On two asymmetric workstations, GraphCL achieves an average acceleration of 1.8x compared to the fastest device. GraphCL achieves also for the set of multi-kernel benchmarks an average 24.5% energy reduction compared to the lazy partition heuristic. that uses all the system processors without considering their power usage.

关键词： data-flow graphs GPGPU OpenCI

来源：评论

学校读者我要写书评

暂无评论

EGAN: A Framework for Exploring the Accuracy vs. Energy Efficiency Trade-off in Hardware Implementation of Error Resilient Applications 21

EGAN: A Framework for Exploring the Accuracy vs. Energy Effi...

引用

21st International Symposium on Quality Electronic Design (ISQED)

作者： Vaeztourshizi, Marzieh Kamal, Mehdi Pedram, Massoud Univ Southern Calif Ming Hsieh Dept Elect Engn Los Angeles CA 90007 USA Univ Tehran Sch Elect & Comp Engn Tehran Iran

ISBN: (纸本)9781728142074

In this paper, we propose a framework, which is called EGAN, for exploring the trade-off between accuracy and energy efficiency in hardware implementation of error resilient applications. EGAN automatically extracts the Pareto frontier (PF) of approximate implementations of an error resilient application based on the data flow graph (DFG) of the application as well as the accuracy and energy consumption of the available approximate/exact components. The framework explores different implementation configurations heuristically to find the best energy efficient implementation of the input application under various output accuracies. The proposed framework, which works by generating some random configurations, clustering them and suggesting some neighboring configurations, reduces the search space considerably. As a result, EGAN achieves a significant reduction in the number of explored configurations compared to the exhaustive (exact) approach while achieving near-optimal results. The efficacy of the proposed framework is assessed using three DSP applications consisting of Sobel edge detector, Finite Inverse Response (FIR) filter and Discrete Cosine Transform (DCT). The studies show that in the worst-case (DCT application with 42 components) EGAN takes 89 hours to extract the PF whereas the exact approach takes 5 million years.

关键词： Approximate High-Level Synthesis Pareto Frontier data-flow graphs Clustering DSP Applications

来源：评论

学校读者我要写书评

暂无评论

Fogflow - Computation Organization for Heterogeneous Fog Computing Environments 19th

FogFlow - Computation Organization for Heterogeneous Fog Com...

引用

19th Annual International Conference on Computational Science (ICCS)

作者： Sendorek, Joanna Szydlo, Tomasz Windak, Mateusz Brzoza-Woch, Robert AGH Univ Sci & Technol Dept Comp Sci Krakow Poland

ISBN: (纸本)9783030227449;9783030227432

With the arising amounts of devices and data that Internet of Things systems are processing nowadays, solutions for computational applications are in high demand. Many concepts targeting more efficient data processing are arising and among them edge and fog computing are the ones gaining significant interest since they reduce cloud load. In consequence Internet of Things systems are becoming more and more diverse in terms of architecture. In this paper we present Fogflow model and execution environment allowing for organization of data-flow applications to be run on the heterogeneous environments. We propose unified interface for data-flow creation, graph model and we evaluate our concept in the use case of production line model that mimic real-world factory scenario.

关键词： IoT Fog computing Stream processing data-flow graphs

来源：评论

学校读者我要写书评

暂无评论

Optimal Functional-Unit Assignment for Heterogeneous Systems Under Timing Constraint

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017年第9期28卷 2567-2580页

作者： Jiang, Weiwen Sha, Edwin Hsing-Mean Chen, Xianzhang Yang, Lei Zhou, Lei Zhuge, Qingfeng Chongqing Univ Coll Comp Sci Chongqing 400044 Peoples R China

In high-level synthesis for real-time systems, it typically employs heterogeneous functional-unit types to achieve high-performance and low-cost designs. In the design phase, it is critical to determine which functional-unit type to be mapped for each operation in a given application such that the total cost is minimized while the deadline can be met. For a path or tree structured application, existing approaches can obtain the minimum-cost assignment, called "optimal assignment", under which the resultant system satisfies a given timing constraint. However, it is still an open question whether there exist efficient algorithms to obtain the optimal assignment for the directed acyclic graph (DAG), or more generally, the data-flow graph with cycles (cyclic DFG). For DAGs, by analyzing the property of the problem, this paper designs an efficient algorithm to obtain the optimal assignments. For cyclic DFGs, we approach this problem with the combination of retiming technique to thoroughly explore the design space. We formulate a Mixed Integer Linear Programming (MILP) model to give the optimal solution. But because of the high degree of its time complexity, we devise a practical algorithm to obtain near-optimal solutions within a minute. Experimental results show the effectiveness of our algorithms. Specifically, compared with existing techniques, we can achieve 25.70 and 30.23 percent reductions in total cost on DAGs and cyclic DFGs, respectively.

关键词： Real-time heterogeneous system data-flow graphs functional-unit assignment retiming optimal algorithms

来源：评论

学校读者我要写书评

暂无评论

An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2009年第8期20卷 1158-1172页

作者： Vydyanathan, Naga Krishnamoorthy, Sriram Sabin, Gerald M. Catalyurek, Umit V. Kurc, Tahsin Sadayappan, Ponnuswamy Saltz, Joel H. Siemens Corp Technol Bangalore 560100 Karnataka India Pacific NW Natl Lab Richland WA 99352 USA RNET Technol Inc Dayton OH 45459 USA Ohio State Univ Dept Biomed Informat Columbus OH 43210 USA Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA

Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two ( also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

关键词： Processor allocation scheduling mixed parallelism data-flow graphs locality-conscious scheduling

来源：评论

学校读者我要写书评

暂无评论

Disproving the perfect-rate property of data-flow graphs unfolded by the least common multiple of the number of loop registers

引用

IEEE TRANSACTIONS ON COMPUTERS 2003年第5期52卷 688-688页

作者： Wolz, F Kolla, R Univ Wurzburg Dept Comp Engn D-97070 Wurzburg Germany

In the paper "Static Rate-Optimal Scheduling of Iterative data-flow Programs via Optimum Unfolding" [1], the authors stated a perfect-rate property on unfolded data-flow graphs (DFGs) This will be disproven ... 详细信息

关键词： data-flow graphs data-flow programs perfect-rate

来源：评论

学校读者我要写书评

暂无评论

Graph-based code selection techniques for embedded processors

引用

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS 2000年第4期5卷 794-814页

作者： Leupers, R Bashford, S Univ Dortmund Dept Comp Sci 12 Dortmund Germany

Code selection is an important task in code generation for programmable processors, where the goal is to find an efficient mapping of machine-independent intermediate code to processor-specific machine instructions. Traditional approaches to code selection are based on tree parsing, which enables fast and optimal code selection for intermediate code given as a set of data-flow trees. While this approach is generally useful in compilers for general-purpose processors, it may lead to poor code quality in the case of embedded processors. The reason is that the special architectural features of embedded processors require performing code selection on data-flow graphs, which are a more general representation of intermediate code. In this paper, we present data-flow graph-based code selection techniques for two architectural families of embedded processors: media processors with support for SIMD instructions and fixed-point DSPs with irregular data paths. Both techniques exploit the fact that, in the area of embedded systems, high code quality is a much more important goal than high compilation speed. We demonstrate that certain architectural features can only be utilized by graph-based code selection, while in other cases this approach leads to a significant increase in code quality as compared to tree-based code selection.

关键词： code selection data-flow graphs SIMD instructions irregular data paths embedded processors

来源：评论

学校读者我要写书评

暂无评论

Scheduling data-flow graphs via retiming and unfolding

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1997年第12期8卷 1259-1267页

作者： Chao, LF Sha, EHM Iowa State Univ Dept Elect & Comp Engn Ames IA 50011 USA Univ Notre Dame Dept Comp Sci & Engn Notre Dame IN 46556 USA

Loop scheduling is an important problem in parallel processing. The retiming technique reorganizes an iteration;the unfolding technique schedules several iterations together. We combine these two techniques to obtain a static schedule with a reduced average computation time per iteration. We first prove that the order of retiming and unfolding is immaterial for scheduling a data-flow graph (DFG). From this nice property, we present a polynomial-time algorithm on the original DFG, before unfolding, to find the minimum-rate static schedule for a given unfolding factor. For the case of a unit-time DFG, efficient checking and retiming algorithms are presented.

关键词： data-flow graphs loop parallelization parallel processing retiming scheduling unfolding

来源：评论

学校读者我要写书评

暂无评论

INCORPORATING INPUT-OUTPUT OPERATIONS INTO DYNAMIC data-flow graphs

引用

PARALLEL COMPUTING 1995年第8期21卷 1285-1311页

作者： EVRIPIDOU, P GAUDIOT, JL UNIV CYPRUS DEPT COMP SCI NICOSIA 134 CYPRUS UNIV SO CALIF DEPT ELECT ENGN SYST LOS ANGELES CA 90089 USA

Driven by the 'side-effect' environment of sequential von Neumann computing, Input/Output operations have evolved as state operations on shared files. In parallel programs, if multiple instances of an I/O-performing process execute concurrently, either the user or the system must synchronize any accesses to shared files. data-flow principles of execution provide an elegant way to ensure at runtime that instructions can be executed asynchronously in a parallel environment. However, while the conventional von Neumann model of interpretation inherited a rigid ordering of instructions, it is the very asynchronous character of the data-flow model of execution which introduces conflicts when 'state' tasks (such as I/O operations) must share common data objects. The scheme presented in this paper, Logical Serialization with Distributed File Pointers (LS-DFP), introduces the two basic I/O operations read and write into the dynamic data-flow graph. However, sequencing I/O operations on the same file based on the availability of data as in 'conventional' data-flow is not possible because the name of the file becomes available simultaneously to ail operations at program initiation. To impose an order, we sequentialize, (logically serialize-LS), the operations according to their lexicographical ordering. Furthermore, several optimizations are introduced that allow the distributed execution of these I/O operations with the use of Distributed File Pointers (DFP). Thus, the LS-DFP scheme can utilize the full level of parallelism of the dynamic data-flow principles of execution.

关键词： data-flow graphs PARALLEL I/O OPERATIONS DISTRIBUTED FILE POINTERS I/O SYNCHRONIZATION PROBLEM INPUT ACCESS GRAPH (IAG)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：