the Internet of things requires developing ultra-low power platforms embedding actuators, sensors, and signal processors. In order to limit the power consumption of such systems, nonuniform sampling schemes are very p...
详细信息
ISBN:
(纸本)9781728123226
the Internet of things requires developing ultra-low power platforms embedding actuators, sensors, and signal processors. In order to limit the power consumption of such systems, nonuniform sampling schemes are very promising solutions, especially if coupled to event-driven circuitries. this strategy allows reducing the amount of sampled data, the system activity, and then the power consumption. Moreover, High-Level Synthesis (HLS) helps designers in rapidly developing ultra-low power platforms. In this article, we present a comparison of the uniform and non-uniform schemes in term of activity and power consumption. this evaluation is performed on Finite Impulse Response (FIR) filters in three different flavors: a synchronous filter using a classical sampling scheme and two asynchronous filters implementing a nonuniform sampling scheme, where one has been manually designed and the other generated by HLS. the filters have been designed in CMOS 350 nm and 40 nm. the filter manually designed provides an area reduction of 8% and depending on the signal and the application, it is able to consume from 6.6 to 43 times less energy than the synchronous version. the filter produced by HLS exhibits an area reduction of 12%, and consume from 3.3 to 28 times less energy.
Image processing is an unceasingly growing area with a range of applications including cryptography, medicine, video surveillance, remote sensing, and many more. Implementing sophisticated algorithms to process the la...
详细信息
ISBN:
(数字)9781728188676
ISBN:
(纸本)9781728188683
Image processing is an unceasingly growing area with a range of applications including cryptography, medicine, video surveillance, remote sensing, and many more. Implementing sophisticated algorithms to process the large amount of data using software solutions makes the response slower, and that's where hardware implementation comes into the picture. Field Programmable Gate Arrays (FPGAs) are getting popular due to low latency, connectivity, parallel computing, and flexibility. the unique architecture of the FPGA has made it possible to use the technology for applying in many applications to have better and faster results. this paper is aiming at providing a comprehensive survey on the hardware implementations of image processingalgorithms to facilitate the improvement in efficiency using FPGAs. the widely used Xilinx PYNQ is also presented in this paper as they play a major role in reducing the development time.
: Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if...
详细信息
ISBN:
(纸本)9781538695333
: Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. the results demonstrate that the time taken to transmit the data is reduced. the job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.
the Internet of things (IoT) will grow seamlessly with advancements in data and communication technologies leading to the deploy meat of trillions of end devices. Its application starts with a simple home automation t...
详细信息
ISBN:
(纸本)9781728144993
the Internet of things (IoT) will grow seamlessly with advancements in data and communication technologies leading to the deploy meat of trillions of end devices. Its application starts with a simple home automation to a very large scale industrial automation system. the trend is leading towards huge data generation requiring high processing power. In the near future, computing resources might not be sufficient for handling dynamic humongous data production. As the technology advances, microcontrollers or System-on-Chips (SoCs) used for IoT end devices are becoming cheaper and more powerful. Hence, there is a requirement of effectively making use of huge number of underutilized IoT of the future by allocating additional microtasks in parallel which would solve the upcoming needs of the technological trend.
Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, ...
详细信息
ISBN:
(纸本)9781728136134
Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, new APIs are emerging which sacrifice many heuristics for lower-level hardware control and more predictable driver behavior. this shifts the burden for many optimizations from GPU driver developers to game programmers, but also provides numerous opportunities to exploit application-specific knowledge. this paper examines different opportunities for specializing GPU code and reducing redundant data transfers. Static analysis of commercial games shows that 5-18% of GPU code is specializable by pruning dead data elements or moving portions to different graphics pipeline stages. In some games, up to 97% of the programs' data inputs of a particular type, namely uniform variables, are unused, as well as up to 62% of those in the GPU internal vertex-fragment interface. this shows potential for improving memory usage and communication overheads. In some test scenarios, removing dead uniform data can lead to 6x performance improvements. We also explore the upper limits of specialization if all dynamic inputs are constant at run-time. For instance, if uniform inputs are constant, up to 44% of instructions can be eliminated in some games, with a further 14% becoming constant-foldable at compile time. Analysis of run-time traces, reveals that 48-91% of uniform inputs are constant in real games, so values close to the upper limit may be achieved in practice.
the paper concerns the use of global application states monitoring in distributed programs for advanced graph partitioning optimization. Two strategies for the control design of advanced parallel/distributed graph par...
详细信息
Blind Source Separation (BSS) is termed as the extraction of source signals from mixed data without or with little knowledge about the source signals. It is used in many fields of research such as military, medical an...
详细信息
ISBN:
(数字)9781728155951
ISBN:
(纸本)9781728155968
Blind Source Separation (BSS) is termed as the extraction of source signals from mixed data without or with little knowledge about the source signals. It is used in many fields of research such as military, medical and industry. Among the many challenges that must be overcome to apply BSS to a real system, implementation restrictions are the most difficult. Software implementations require a significant amount of overhead to make them applicable to real-time systems and have a lower maximum speed of operation than hardware implementations. When speed is desired, the algorithm should be implemented on specialized hardware such as a Field Programmable Gate Array(FPGA) which allows the user to take advantage of parallel computational abilities. this paper describes the methodology for implementing real-time DSP applications on FPGA and the concept of hardware software cosimulation of the Second Order Blind Identification (SOBI) algorithm using MATLAB Simulink and Xilinx System Generator (XSG). Performances of efficient architectures are implemented on FPGA ZYBO Z7 Zynq-7020.
parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scena...
详细信息
ISBN:
(纸本)9781728116440
parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. the experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallelarchitectures and parallel programming frameworks.
Complex processor system design is partitioned into smaller sub-systems. these subsystems consist of circuits such as the adder, subtractor, multiplier, and functional units. Adders are the elementary building block i...
详细信息
ISBN:
(纸本)9781538695333
Complex processor system design is partitioned into smaller sub-systems. these subsystems consist of circuits such as the adder, subtractor, multiplier, and functional units. Adders are the elementary building block in these arithmetic subsystems. Arithmetical and Logical Unit (ALU) is an essential component in CPU. processing speed in CPU depends upon the time taken for processing a data. In the critical path of arithmetic subsystems, the main components are adders. the important metric for measuring the quality of adders is the critical-path delay, number of logic levels and area. Improving the speed without compromising the power is of greater concern. thus, there is a need for an energy-efficient, low-power, and high-performance architecture. In this paper, a modified approach is presented to address this issue. the design employed utilizes the advantage of parallel prefix architecture (PPA) design for low power applications. Hardware description language (HDL) is used to describe the functionality of the system. To effectively implement this adder we use the concept of "Hybrid Variable Latency" technique. the Performance parameters of the adder are reported using Cadence tool (RTL-Compiler) in 45-nm technology and functionality of the design is validated using Vivado HLS In comparison withthe existing techniques, the proposed scheme is relatively faster and also it shows improvement in delay by 7.19% and 15.63% respectively.
Massive multiple-input multiple-output (MIMO) is a novel communication technique featured with tens of users at terminals are served by hundreds of antennas at the base-station. It is a promising solution for upcoming...
ISBN:
(数字)9781728185897
ISBN:
(纸本)9781728185903
Massive multiple-input multiple-output (MIMO) is a novel communication technique featured with tens of users at terminals are served by hundreds of antennas at the base-station. It is a promising solution for upcoming 5G communication and has been adopted by standards such as 3GPP LTE and IEEE 802.11n In due to its higher spectral and energy efficiency. However, prohibitive computation complexity of data detection, low hardware efficiency and high hardware complexity have been the obstacle for wide application of MIMO. In this paper, a comprehensive summary about designs of algorithm and VLSI hardware architecture to solve mentioned problems will be proposed. these designs include using innovative Truncated Neumann series, Gauss-Seidel Method and so on to lower computational complexity and using iterative computation unit, pipeline design and so on to improve the hardware architecture. these methods have improved the performance of MIMO from different aspects and to different extends. Based on the summary, some improvement of mentioned methods and prediction of probable trends of the development of the technique will be articulated. All these methods and future development should be focused on finding better ways to circumvent matrix inversion and minimize deterioration in other aspects brought by it.
暂无评论