the following topics are dealt with: computerarchitecture; highperformancecomputing; parallel and distributed algorithm, routing and communication; application-specific architectures and reconfigurable systems; gri...
the following topics are dealt with: computerarchitecture; highperformancecomputing; parallel and distributed algorithm, routing and communication; application-specific architectures and reconfigurable systems; grid, cluster, pervasive, and heterogeneous computing; languages, compilers, and tools; processor microarchitectures; operating systems; processor and cache memory architectures, benchmarking, and performance analysis; fault tolerant systems; and load balancing.
this paper investigates the impact of approximate subtractors in the hardware architecture of the Sum of Absolute Differences (SAD) computation within the Test Zone Search (TZS) algorithm, commonly used in current vid...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
this paper investigates the impact of approximate subtractors in the hardware architecture of the Sum of Absolute Differences (SAD) computation within the Test Zone Search (TZS) algorithm, commonly used in current video encoders like the Versatile Video Coding (VVC). Four state-of-the-art imprecise subtractors $(AppS, AXSC1, AXSC2, AXSC3)$ were analyzed across various video resolutions to assess their influence on computational effort, energy consumption, and coding efficiency. the results show that subtractors like $AppS_{4}$ and $AXCS1_{4}$ provide significant reductions in energy consumption with minimal impact on video coding quality. these findings are especially relevant for low-power devices and embedded systems, where energy efficiency is critical. the use of imprecise subtractors offers a promising tradeoff between computational efficiency and energy savings, making them a viable solution for high-performance video encoders.
Rotating machines, such as motors and pumps, are of crucial importance for industrial operations, but are prone to failure due to their increasing complexity. Condition-based monitoring and early fault diagnosis, espe...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
Rotating machines, such as motors and pumps, are of crucial importance for industrial operations, but are prone to failure due to their increasing complexity. Condition-based monitoring and early fault diagnosis, especially through vibration analysis, are essential to avoid costly downtime. Although cloud computing is widely used for machine condition monitoring, it can be inefficient due to the high data transfer and resource requirements. Edge computing offers a solution by processing the data locally on the devices, reducing latency, bandwidth usage and energy consumption. this paper presents an IoT sensor node for vibration monitoring of electric motors and compares the efficiency of local feature extraction (edge processing) with transmitting raw data to a server and remote feature extraction (cloud processing). We show that local feature extraction leads to 65.7% lower energy consumption and 37% faster execution time than cloud processing.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. the papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. they reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. the papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. they reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
Power consumption has become a limiting factor in all areas of computing. Hence, making the most of the available power budget is paramount. To use the available budget most efficiently, techniques like dynamic voltag...
详细信息
ISBN:
(纸本)9798400710735
Power consumption has become a limiting factor in all areas of computing. Hence, making the most of the available power budget is paramount. To use the available budget most efficiently, techniques like dynamic voltage and frequency scaling and idle states can be used. this work analyzes the instructions UMWAIT, TPAUSE, and MWAITX on three different systems. We analyze their instruction latencies, power consumptions, and dependencies on core frequencies. To do so, we introduce benchmarks to gather performance and power parameters, which can be used for future software optimizations. Key findings include: the expected sleep duration passed to UMWAIT and TPAUSE can influence the depth of the user idle state. the actual sleep duration of TPAUSE increases stepwise with an increasing expected sleep duration. Requesting a deeper idle state leads to an additional sleep duration, which increases with a lower core frequency. the core frequency influences the instruction latency of TPAUSE, where a low frequency can lead to an irregular performance pattern. the latency of TPAUSE, UMWAIT, and MWAITX is most often higher than requested on the evaluated systems. Core power consumption can be reduced by ~20% ~70% compared to the usage of PAUSE. the latency for waking a core in user idle reflects the underlying hardware architecture with tens (desktop architecture with shallow idle states) to hundreds (server architecture with deep idle states) of nanoseconds at nominal frequencies.
Convolutional Neural Networks (CNNs) are widely used for optical character recognition of vehicle license plates in automatic license plate recognition (ALPR) systems. However, their high computational complexity make...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
Convolutional Neural Networks (CNNs) are widely used for optical character recognition of vehicle license plates in automatic license plate recognition (ALPR) systems. However, their high computational complexity makes meeting specific ALPR applications' time and cost requirements challenging. this work aimed to develop a CNN architecture and select a hardware acceleration technique to create a low-cost optical character recognition (OCR) system capable of real-time vehicle identification. We designed the CNN architecture with accuracy and simplicity in mind, and we chose the hardware acceleration technique based on silicon cost and performance. Our 8-bit quantized CNN achieved an accuracy of 97.11%, and the accelerator resulted in a latency of 4.21 ms and a throughput of 598 FPS. the solution offers accuracy and performance comparable to related work methods, using less than 20% of the hardware resources.
Lossy video compression introduces visual artifacts that degrade video quality, where deep neural networks (DNNs) are effective in enhancement. However, conventional DNN-based methods often focus on a single video com...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
Lossy video compression introduces visual artifacts that degrade video quality, where deep neural networks (DNNs) are effective in enhancement. However, conventional DNN-based methods often focus on a single video compression standard, limiting their deployment in multiple cases. To overcome this issue, this study introduces a multi-domain video quality enhancement architecture based on the Spatio-Temporal Deformable Fusion (STDF) technique. this method enables the model to enhance videos compressed with multiple codecs, maintaining reliable performance across standards. After trained, the proposed architecture was tested with videos compressed by the high Efficiency Video Coding (HEVC) encoder, the Versatile Video Coding (VVC) encoder, the VP9 codec and the AOMedia Video 1 (AV1) codec. Results show an average Peak Signal-to-Noise Ratio (PSNR) improvement between 0.228 dB and 0.787 dB.
暂无评论