Unikernels provide an efficient and lightweight way to deploy cloud computing services in application-specialized and single-address-space virtual machines (VMs). They can efficiently deploy hundreds of unikernel-base...
详细信息
Unikernels provide an efficient and lightweight way to deploy cloud computing services in application-specialized and single-address-space virtual machines (VMs). They can efficiently deploy hundreds of unikernel-based VMs in a single physical server. In such a cloud computing platform, main memory is the primary bottleneck resource for high-density application deployment. Recently, non-volatile memory (NVM) technologies has become increasingly popular in cloud data centers because they can offer extremely large memory capacity at a low expense. However, there still remain many challenges to utilize NVMs for unikernel-based VMs, such as the difficulty of heterogeneous memory allocation and high performance overhead of address *** this paper, we present UCat, a heterogeneous memory management mechanism that support multi-grained memory allocation for unikernels. We propose front-end/back-end cooperative address space mapping to expose the host memory heterogeneity to unikernels. UCat exploits large pages to reduce the cost of two-layer address translation in virtualization environments, and leverages slab allocation to reduce memory waste due to internal memory fragmentation. We implement UCat based on a popular unikernel--OSv and conduct extensive experiments to evaluate its efficiency. Experimental results show that UCat can reduce the memory consumption of unikernels by 50% and TLB miss rate by 41%, and improve the throughput of real-world benchmarks such as memslap and YCSB by up to 18.5% and 14.8%, respectively.
The massive integration of communication and information technology with the large-scale power grid has enhanced the efficiency, safety, and economical operation of cyber-physical systems. However, the open and divers...
详细信息
The massive integration of communication and information technology with the large-scale power grid has enhanced the efficiency, safety, and economical operation of cyber-physical systems. However, the open and diversified communication environment of the smart grid is exposed to cyber-attacks. Data integrity attacks that can bypass conventional security techniques have been considered critical threats to the operation of the grid. Current detection techniques cannot learn the dynamic and heterogeneous characteristics of the smart grid and are unable to deal with non-euclidean data types. To address the issue, we propose a novel Deep-Q-Network scheme empowered with a graph convolutional network (GCN) framework to detect data integrity attacks in cyber-physical systems. The simulation results show that the proposed framework is scalable and achieves higher detection accuracy, unlike other benchmark techniques.
Residential burglary is a severe crime that affects millions of residents each year. It is critical to analyze patterns of human behavior in surveillance video data and discover suspicious actions to avoid and deter t...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running gra...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph *** domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software *** article presents a graph processing ecosystem from hardware to *** start by introducing a series of hardware accelerators as the foundation of this ***,the codesigned parallel graph systems and their distributed techniques are presented to support graph ***,we introduce our efforts on novel graph applications and hardware *** results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
The emergence of multimodal disease risk prediction signifies a pivotal shift towards healthcare by integrating information from various sources and enhancing the reliability of predicting susceptibility to specific d...
详细信息
The disease that contains the highest mortality and morbidity across the world is cardiac disease. Annually millions of people are affected and deaths take place due to cardiac diseases worldwide. There are various di...
详细信息
Most optimization problems of practical significance are typically solved by highly configurable parameterized *** achieve the best performance on a problem instance,a trial-and-error configuration process is required...
详细信息
Most optimization problems of practical significance are typically solved by highly configurable parameterized *** achieve the best performance on a problem instance,a trial-and-error configuration process is required,which is very costly and even prohibitive for problems that are already computationally intensive,*** problems associated with machine learning *** the past decades,many studies have been conducted to accelerate the tedious configuration process by learning from a set of training *** article refers to these studies as learn to optimize and reviews the progress achieved.
Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory *** previous proposals usually...
详细信息
Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory *** previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM *** this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory *** the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of *** design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane *** also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache ***,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM *** implement Mocha in an architectural *** results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.
Modern recommendation systems are widely used in modern data *** random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they...
详细信息
Modern recommendation systems are widely used in modern data *** random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and ***-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are ***,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM ***,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance *** this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource ***,we make a full analysis of the computation pattern and access pattern on the decomposed *** on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate *** on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource *** the unified computation and mapping strategy,we can coordinatethe inter-processing elements *** evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively.
Graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in *** most real-world scenarios,entitie...
详细信息
Graphs that are used to model real-world entities with vertices and relationships among entities with edges,have proven to be a powerful tool for describing real-world problems in *** most real-world scenarios,entities and their relationships are subject to constant *** that record such changes are called dynamic *** recent years,the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics *** the scale of dynamic graphs becomes larger,higher performance requirements are demanded to dynamic graph processing *** the massive parallel processing power and high memory bandwidth,GPUs become mainstream vehicles to accelerate dynamic graph processing ***-based dynamic graph processing systems mainly address two challenges:maintaining the graph data when updates occur(i.e.,graph updating)and producing analytics results in time(i.e.,graph computing).In this paper,we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph *** comprehensively discuss existing dynamic graph processing systems on GPUs,we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph *** addition,we discuss the challenges and future research directions of dynamic graph processing on GPUs.
暂无评论