Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of ...
详细信息
Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of applications. Early studies mainly focus on improving the programmability for computation offloading on PIM architectures. They lack a comprehensive analysis of computation locality and hence fail to accelerate a wide variety of applications. In this paper, we present a general-purpose instruction-level offloading technique for near-DRAM PIM architectures, namely IOTPIM, to exploit PIM features comprehensively. IOTPIM is novel with two technical advances: 1) a new instruction offloading policy that fully considers the locality of the whole on-chip cache hierarchy, and 2) an offloading performance benefit prediction model that directly predicts offloading performance benefits of an instruction based on the input dataset characterizes, preserving low analysis overheads. The evaluation demonstrates that IOTPIM can be applied to accelerate a wide variety of applications, including graph processing, machine learning, and image processing. IOT-PIM outperforms the state-of-the-art PIM offloading techniques by 1.28×-1.51× while ensuring offloading accuracy as high as 91.89% on average.
Lattice Boltzmann method (LBM) is a promising approach to solving Computational Fluid Dynamics (CFD) problems, however, its nature of memory-boundness limits nearly all LBM algorithms' performance on modern comput...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
Lattice Boltzmann method (LBM) is a promising approach to solving Computational Fluid Dynamics (CFD) problems, however, its nature of memory-boundness limits nearly all LBM algorithms' performance on modern computer architectures. This paper introduces novel sequential and parallel 3D memory-aware LBM algorithms to optimize its memory access performance. The introduced new algorithms combine the features of single-copy distribution, single sweep, swap algorithm, prism traversal, and merging two temporal time steps. We also design a parallel methodology to guarantee thread safety and reduce synchronizations in the parallel LBM algorithm. At last, we evaluate their performances on three high-end manycore systems and demonstrate that our new 3D memory-aware LBM algorithms outperform the state-of-the-art Palabos software (which realizes the Fuse Swap Prism LBM solver) by up to 89%.
Big data workflows have emerged as a powerful paradigm that enables researchers and practitioners to run complex multi-step computational processes in the cloud to gain insight into their large datasets. To create a w...
详细信息
ISBN:
(数字)9798350373554
ISBN:
(纸本)9798350373561
Big data workflows have emerged as a powerful paradigm that enables researchers and practitioners to run complex multi-step computational processes in the cloud to gain insight into their large datasets. To create a workflow, a user logs on to a specialized software, called Big Data Workflow Management System, or simply BDW system, to select and connect together various components, or tasks, into a workflow. The workflow is then mapped onto a set of distributed compute resources, such as Virtual Machines (VMs), and storage resources, such as S3 buckets and EBS volumes. It is then executed, with different branches and tasks of the workflow running in parallel on different nodes. During execution, the BDW system captures provenance, which is the history of data derivation that describes data processing steps that yielded each output result. Workflow management, including workflow composition and schedule refinement, is a challenging problem. This problem is further exacerbated by the growing number and heterogeneity of workflow tasks and cloud resources, as well as by the growing size and complexity of workflow structures. Few efforts were made to leverage provenance for facilitating workflow composition and schedule refinement. To address these issues, we 1) produce a comprehensive conceptual model for big data workflow provenance that captures the complexity and heterogeneity of cloud-based workflow execution, 2) propose a scalable Cassandra database schema for provenance-aware workflow composition and schedule refinement, 3) outline a four-step provenance-based schedule refinement process for balancing workflow execution time and cost, and 4) present a scalable and highly available microservices-based reference architecture for big data workflow management in the cloud. Our proposed loosely coupled architecture ensures superior scalability, as well as operational and technological independence of each module within the BDW system.
With the growth of renewable energy, grid-connected inverter as the interface has gradually increased its application. However, with the large-scale connection of inverters, resonance and stability problems have also ...
详细信息
ISBN:
(纸本)9781728163444
With the growth of renewable energy, grid-connected inverter as the interface has gradually increased its application. However, with the large-scale connection of inverters, resonance and stability problems have also been brought to distributed systems. Aiming at this issue, in this paper, a multi-parallel equivalent small signal model based on the structure of the multi-inverter grid-connected system is established firstly, and the expression of the output current of each inverter is obtained. According to the phase characteristics of the inverter's output at different frequency bands, the output current expressions for synchronized and interleaved control period are further obtained. On this basis, influences of penetration, control gain and delay on harmonic or resonance are analyzed. Finally, a platform of four 30kW inverters is setup, and "resonance stability margin" method is proposed to verify the harmonic resonance characteristics of the inverter output current under high penetration.
In the last few years, deep-learning models are becoming crucial for numerous scientific and industrial applications. Due to the growth and complexity of deep neural networks, researchers have been investigating techn...
详细信息
Private Set Intersection (PSI) is one of the most important functions in secure multiparty computation (MPC). PSI protocols have been a practical cryptographic primitive and there are many privacy-preserving applicati...
Private Set Intersection (PSI) is one of the most important functions in secure multiparty computation (MPC). PSI protocols have been a practical cryptographic primitive and there are many privacy-preserving applications based on PSI protocols such as computing conversion of advertising and distributed computation. Private Set Intersection Cardinality (PSI-CA) is a useful variant of PSI protocol. PSI and PSI-CA allow several parties, each holding a private set, to jointly compute the intersection and cardinality, respectively without leaking any additional information. Nowadays, most PSI protocols mainly focus on two-party settings, while in multiparty settings, parties are able to share more valuable information and thus more desirable. On the other hand, with the advent of cloud computing, delegating computation to an untrusted server becomes an interesting problem. However, most existing delegated PSI protocols are unable to efficiently scale to multiple clients. In order to solve these problems, this paper proposes MDPPC, an efficient PSI protocol which supports scalable multiparty delegated PSI and PSI-CA operations. Security analysis shows that MDPPC is secure against semi-honest adversaries and it allows any number of colluding clients. For 15 parties with set size of 2 20 on server side and 2 16 on clients side, MDPPC costs only 81 seconds in PSI and 80 seconds in PSI-CA, respectively. The experimental results show that MDPPC has high scalability.
Communication networks have been extensively deployed as an important infrastructure of power grid. To ensure the robustness of power communication networks, the fault prediction mechanism plays a pivotal role for the...
详细信息
With the development of the economy and the industrial structure regulation, the load characteristics of users and industries are affected by a growing number of factors. The accuracy of load forecasting methods that ...
详细信息
Rapidly generated data and the amount magnitude of data analytical jobs pose great pressure to the underlying computing facilities. A distributed multi-cluster computing environment such as a hybrid cloud consequently...
详细信息
ISBN:
(纸本)9781450388160
Rapidly generated data and the amount magnitude of data analytical jobs pose great pressure to the underlying computing facilities. A distributed multi-cluster computing environment such as a hybrid cloud consequently raises its necessity due to its advantages in adapting geographically distributed and potentially cloud-based computing resources. Different clusters forming such an environment could be heterogeneous and may be resource-elastic as well. From analytical perspective, in accordance with increasing needs on streaming applications and timely analytical demands, many data analytical jobs nowadays are time-critical in terms of their temporal urgency. And the overall workload of the computing environment can be hybrid to contain both time-critical and general applications. These all call for an efficient resource management approach capable to apprehend both computing environment and application features. However, the added up complexity and high dynamics of the system greatly hinder the performance of traditional rule-based approaches. In this work, we propose to utilize deep reinforcement learning for developing elasticity-compatible resource management for a heterogeneous distributedcomputing environment, aiming for less occurrences of missing temporal deadline while maintaining low average execution time ratio. Along with reinforcement learning we design a deep model employing Long Short-Term Memory (LSTM) structure and partial model sharing for multi-target learning mechanism. The experimental results show that the proposed approach could greatly outperform baselines and serve as a robust resource management for variant workloads.
The Internet of Things (IoT) plays a significant role in shaping different aspects of our lives. IoT devices have become increasingly important due to their ability to connect, collect, and analyze data, automate proc...
The Internet of Things (IoT) plays a significant role in shaping different aspects of our lives. IoT devices have become increasingly important due to their ability to connect, collect, and analyze data, automate processes, improve safety and efficiency, and deliver personalized experiences. However, the advancement in quantum computer development poses a significant threat to resource-constrained IoT devices. This new generation of computers can break the classic public-key cryptographic schemes and digital signatures implemented in these IoT devices. While protecting IoT devices from quantum computer attacks poses many challenges, researchers are continuously making significant progress in developing lightweight post-quantum cryptographic algorithms for efficient key exchange mechanisms and digital signature algorithms tailored to IoT devices to overcome this issue. This paper proposes Q-SECURE, a post-Quantum resistant Security Enhancing Cryptography for Unified Resource-constrained device Encryption. A novel scheme that enables any IoT system to leverage the assistance of other devices in the network to gain the capability to generate any proposed post-quantum cryptographic key of a given size using distributed and parallelcomputing.
暂无评论