Data movement between memory subsystem and processor unit is a crippling performance and energy bottleneck for data-intensive applications. Near Memory processing (NMP) is a promising solution to alleviate the data mo...
详细信息
Data movement between memory subsystem and processor unit is a crippling performance and energy bottleneck for data-intensive applications. Near Memory processing (NMP) is a promising solution to alleviate the data movement bottleneck. The introduction of 3D-stacked memories and more importantly hybrid memory systems enable the long-wished NMP capability. This work explores the feasibility and efficacy of having NMP on the hybrid memory system for a given set of applications. In this paper, we first redefine a set of NMP-centric performance metrics in order to analyze the efficacy of a given processing unit. Leveraging the proposed metrics, we characterize various sets of applications to assess the suitability of a processing unit in terms of performance. Specifically, in this work we motivate the efficiency of NMP subsystems to process memory-intensive applications when 3D-NVM technologies are employed.
In the distributed stream processing, data skew and dynamics can result in imbalanced load distribution at downstream tasks and affect the throughput of the systems. Efficient data distribution method is urgently need...
详细信息
The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated...
详细信息
ISBN:
(纸本)9783030692438
The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated high volume raw data in an expected time spans. A modern streamlined data processing platform is required to execute analytical pipelines over a continues flow of data-items that might arrive in a high rate. In most cases, the platform is also expected to dynamically adapt to dynamic characteristics of the incoming traffic rates and the ever-changing condition of underlying computational resources while fulfill the tight latency constraints imposed by the end-users. Apache Storm has emerged as an important open source technology for performing stream processing with very tight latency constraints over a cluster of computing nodes. To increase the overall resource utilization, however, the service provider might be tempted to use a consolidation strategy to pack as many applications as possible in a (cloud-centric) cluster with limited number of working nodes. However, collocated applications can negatively compete with each other, for obtaining the resource capacity in a shared platform that, in turn, the result may lead to a severe performance degradation among all running applications. The main objective of this work is to develop an elastic solution in a modern stream processing ecosystem, for addressing the shared resource contention problem among collocated applications. We propose a mechanism, based on design principles of Model Predictive Control theory, for coping with the extreme conditions in which the collocated analytical applications have different quality of service (QoS) levels while the shared-resource interference is considered as a key performance limiting parameter. Experimental results confirm that the proposed controller can successfully enhance the p -99 latency of high priority applications by 67%, compared to the default round r
Partitioning computational load over different processing elements is a crucial issue in parallel computing. This is particularly relevant in the case of parallel execution of structured grid computational models, suc...
详细信息
ISBN:
(纸本)9783030390815
Partitioning computational load over different processing elements is a crucial issue in parallel computing. This is particularly relevant in the case of parallel execution of structured grid computational models, such as Cellular Automata (CA), where the domain space is partitioned in regions assigned to the parallel computing nodes. In this work, we present a dynamic load balancing technique that provides for performance improvements in structured grid model execution on distributed memory architectures. First tests implemented using the MPI technology have shown the goodness of the proposed technique in sensibly reducing execution times with respect to not-balanced parallel versions.
Blockchain technologies can safely and neutrally store and process transaction data (including smart contracts) on the chain. Based on smart contracts, a special type of applications can be deployed, known as Decentra...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
Blockchain technologies can safely and neutrally store and process transaction data (including smart contracts) on the chain. Based on smart contracts, a special type of applications can be deployed, known as Decentralized applications (DApps). Within existing Blockchain platforms, Ethereum is the most popular one adopted for DApp developments. The performance constraints of Ethereum dramatical impact usability of DApp. In this paper, we experimentally evaluate the performances of Ethereum from 3 types of tests: 1) Account balance query latency 2) Block generation time and 3) End-to-end transaction acceptance latency. The results show that the end-to-end transaction time in Ethereum is unstable. Consequently, the applications with low latency constraints and high frequency transaction requirements are not ready to be deployed unless off-chain transaction methods are considered.
Cultural Computing is closely related to the of Digital Art Interactive Exhibition. Based on Cultural Computing and by means of the dynamic display of digital art, culture can be recognized, inherited and innovated. T...
详细信息
ISBN:
(纸本)9783030774103;9783030774110
Cultural Computing is closely related to the of Digital Art Interactive Exhibition. Based on Cultural Computing and by means of the dynamic display of digital art, culture can be recognized, inherited and innovated. This article explores the new concept of "Cinema Chain Mode of Digital Art Exhibition". Based on the development of Cultural Computing, it is proposed that Digital Art Exhibitions will become a new distributed structure of Cinema Chain with networked, dynamic and chain development, and has a flat, Interactive, entertaining, commercial and other characteristics, through distributed data sharing, form a deeper cultural computing function. Super subjective space art intervention can open the exhibition's cinema development in the future. The cultural computing information spread of "Cinema Chain Mode of Digital Art Exhibition" needs to transform from the single "transmitters" and "recipients" in traditional exhibitions to "information spread stakeholders" in the context of information space. In this spread process It is necessary to strengthen the relevance of the theme and content, pay attention to the participation of stakeholders, improve the dissemination of information, and strengthen the infection of the venue and the influence of the exhibition. Based on cultural computing, the innovation model of "Cinema Chain Mode of Digital Art Exhibition" can be divided into four layers: cultural resource layer, information integration and processing layer, information media combination layer, and exhibition and communication space layer. In this paper, digital art exhibitions such as "TeamLab: The World of Water Partitions in Oil Tank" and "2020 HUELEAD Show" are taken as main research cases to analyze the artistic characteristics, cultural connotation and immersive experience of cultural computing behind them, so as to provide experience reference for relevant practice.
Network-on-chip (NoC) has been introduced as a novel communication interconnection structure to overcome the limitations of traditional structures in terms of bandwidth, latency, and scalability. Quick feasibility ass...
详细信息
ISBN:
(纸本)9781665405881
Network-on-chip (NoC) has been introduced as a novel communication interconnection structure to overcome the limitations of traditional structures in terms of bandwidth, latency, and scalability. Quick feasibility assessment of various NoC structures demands the availability of high abstraction level simulation tools. However, existing NoC simulator solutions are either too slow to address the functional simulation of real-life applications or not flexible enough to allow NoC-based design space exploration. Furthermore, the necessity to run the application tasks while observing the data-dependent traffic is often missing. In this regard, the simulation tools modeling NoC interconnection structures have to cope with various NoC parameters and allow easy modeling of applications. In addition, these tools should be cycle-accurate and provide flexibility and capability of configuration and customization. This paper addresses the requirements of NoC simulator that enable rapid design and verification of NoC-based systems. It illustrates the design experience using flexible cycle-accurate NoC simulator to implement and verify emergent applications.
With the development of high-performance embedded applications, object detection algorithms are starting to be deployed to embedded devices with fewer resources, and the classification and location information they pr...
详细信息
With the development of high-performance embedded applications, object detection algorithms are starting to be deployed to embedded devices with fewer resources, and the classification and location information they provide is the basis for making further decisions in some high-performance embedded applications, so real-time and low-power consumption are especially needed. In this paper, we propose an FPGA-based neural network ac-celerator for object detection, which improves the speed, throughput and power efficiency of the inference with little loss of accuracy. The accelerator balances the speed difference between computation and memory access. The accelerator is scalable and can be configured for CNN of any size, and is general enough to support the acceleration of YOLOv2, YOLOv3 and their Tiny versions. We deploy the Tiny YOLOv3 network on FPGA-based accelerator, CPU and GPU platforms, and compare them in terms of speed, throughput and power efficiency. For the Tiny YOLOv3 network, the accelerator delivers an FPS of 1.61 and throughput of 8.95 GOP/s, which is 23.81x, 4.11x and 3.00x better than the Intel Core i5-10210U CPU with the 00, 01 and O3 compilation optimization options respectively. With 2.11 W of power, the accelerator delivers a power efficiency of 4.24 GOP/s/W, which is 141.33x, 28.27x and 21.20x better than the Intel Core i5-10210U CPU with the 00, O1 and O3 options respectively.
High Performance Computing (HPC) and, in general, parallel and distributed Computing (PDC) is ubiquitous. Every computing device, from a smartphone to a supercomputer, relies on parallelprocessing. Compute clusters o...
High Performance Computing (HPC) and, in general, parallel and distributed Computing (PDC) is ubiquitous. Every computing device, from a smartphone to a supercomputer, relies on parallelprocessing. Compute clusters of multicore and manycore processors (CPUs and GPUs) are routinely used in many subdomains of computer science, such as data science, parallel machine learning and high performance computing. Therefore, it is important for every computing professional (and especially every programmer) to understand how parallelism and distributed computing affect problem solving. It is essential for educators to impart a range of PDC and HPC skills and knowledge at multiple levels within the curriculum of Computer Science (CS), Computer Engineering (CE), and related disciplines such as computational data science. Software industry and research laboratories require people with these skills, more so now. Therefore, they now engage in extensive on-the-job training. Additionally, rapid changes in hardware platforms, languages, and programming environments increasingly challenge educators to decide what to teach and how to teach it, in order to prepare students for careers that are increasingly likely to involve PDC and HPC. EduHiPC aims to provide a forum that brings together academia, industry, government, and non-profit organizations – especially from India, its vicinity, and Asia – for exploring and exchanging experiences and ideas about the inclusion of high-performance, parallel, and distributed computing into undergraduate and graduate curriculum of Computer Science, Computer Engineering, Computational Science, Computational Engineering, and computational courses for STEM and business and other non-STEM disciplines.
Genomics and bioinformatics have grown as an independent field and are an area of active research currently. In this work, we first discuss the idea of the genome, its importance and wide-ranging applications in healt...
详细信息
暂无评论