distributed persistent key-value store (KVS) plays an important role in today's storage infrastructure. The development of persistent memory (PM) and remote direct memory access (RDMA) allows to build distributed ...
详细信息
ISBN:
(纸本)9798350339864
distributed persistent key-value store (KVS) plays an important role in today's storage infrastructure. The development of persistent memory (PM) and remote direct memory access (RDMA) allows to build distributed persistent KVS to provide fast data access. However, prior works focus on either PM-oriented or RDMA-oriented optimizations for key-value stores. We find these optimizations disallow a simple porting of RDMA-enabled KVS to PM or vice versa. This paper proposes FastStore, a high-performance distributed persistent KVS, by fully exploiting RDMA features and PM-friendly optimizations. First, FastStore utilizes RDMA-enabled PM exposure to establish direct indexing at the client side to reduce RTTs for reading values. Meanwhile, PM exposure allows PM sharing among cluster nodes, which helps to mitigate attribute-value skewness. Then, FastStore designs PM-friendly ownership transferring log and failure-atomic slotted-page allocator to achieve highly efficient PM management without PM leakage. Finally, FastStore proposes volatile search key to its B+tree indexing to reduce excessive PM accesses. We implement FastStore and the evaluation shows that FastStore outperforms the state-of-the-art ordered KVS Sherman by 2.8x higher throughput and 71.5% fewer RTTs.
This paper presents an intelligent vision chip featuring self-powered in-pixel computing and in-memorycomputing (Selfputing) for distributed vision sensor nodes. The sensed light signal is processed with computing-in...
详细信息
ISBN:
(纸本)9798350388138;9798350388145
This paper presents an intelligent vision chip featuring self-powered in-pixel computing and in-memorycomputing (Selfputing) for distributed vision sensor nodes. The sensed light signal is processed with computing-in-pixel (CIP) scheme in the pixel array utilizing light-induced photocurrent to extract convolutional features. The output of CIP array is processed by SRAM-based charge-domain computing-in-memory (CIM) array to derive the image classification result. A power-management-unit driven by energy-harvesting photovoltaic (PV) cell is designed to power the CIP and CIM array. The Selfputing chip is fabricated in 180 nm CMOS process and consumes 0.57 mu W power at a frame rate of 15 fps with 95.1% accuracy on MNIST dataset, achieving self-powered operation when driven by a PV cell with 1.26 mu W harvesting power.
Fast and accurate numerical simulations are crucial for designing large-scale geological carbon storage projects ensuring safe long-term CO2 containment as a climate change mitigation strategy. These simulations invol...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Fast and accurate numerical simulations are crucial for designing large-scale geological carbon storage projects ensuring safe long-term CO2 containment as a climate change mitigation strategy. These simulations involve solving numerous large and complex linear systems arising from the implicit Finite Volume (FV) discretization of PDEs governing subsurface fluid flow. Compounded with highly detailed geomodels, solving linear systems is computationally and memory expensive, and accounts for the majority of the simulation time. Modern memory hierarchies are insufficient to meet the latency and bandwidth needs of large-scale numerical simulations. Therefore, exploring algorithms that can leverage alternative and balanced paradigms such as dataflow and in-memorycomputing is crucial. This work introduces a matrix-free algorithm to solve FV-based linear systems using a dataflow architecture to significantly minimize memory latency and bandwidth bottlenecks. Our implementation achieves two orders of magnitude speedup compared to a GPGPU-based reference implementation, and up to 1.2 PFlops on a single dataflow device.
Leader election is a critical and extensively studied problem in distributedcomputing. This paper introduces the study of leader election using mobile agents. Consider n agents initially placed arbitrarily on the nod...
详细信息
ISBN:
(纸本)9783031814037;9783031814044
Leader election is a critical and extensively studied problem in distributedcomputing. This paper introduces the study of leader election using mobile agents. Consider n agents initially placed arbitrarily on the nodes of an arbitrary, n-node, m-edge graph G. These agents move autonomously across the nodes of G and elect one agent as the leader such that the leader is aware of its status as the leader, and the other agents know they are not the leader. The goal is to minimize both time and memory usage. We study the leader election problem in a synchronous setting where each agent performs operations simultaneously with the others, allowing us to measure time complexity in terms of rounds. We assume that the agents have prior knowledge of the number of nodes n and the maximum degree of the graph Delta. We first elect a leader deterministically in O(n log(2) n + D Delta log n) rounds with each agent using O(log n) bits of memory, where D is the diameter of the graph. Leveraging this leader election result, we then present a deterministic algorithm for constructing a minimum spanning tree of G in O(m+ n log n) rounds, with each agent using O(Delta log n) bits of memory. Finally, using the same leader election result, we improve time and memory bounds for other key distributed graph problems, including gathering, maximal independent set, and minimal dominating set. For all the aforementioned problems, our algorithms remain memory-optimal.
Erasure codes have been widely applied to in-memory key-value storage systems for high reliability and low redundancy. In distributed in-memory key-value storage systems, update operations are relatively frequent, esp...
详细信息
ISBN:
(纸本)9798350381993;9798350382006
Erasure codes have been widely applied to in-memory key-value storage systems for high reliability and low redundancy. In distributed in-memory key-value storage systems, update operations are relatively frequent, especially the partial-stripe update, which makes data update more challenging. Recently, existing research has been based on appending logs to accelerate parity data write. However, its logs are stored on disks, which decreases the system performance significantly. Therefore, we propose a novel in-memory key-value storage architecture, DNVPL, which utilizes NVRAM to log parity data. Our main idea is to design an appending-only update scheme to tradeoff the memory cost and the update overhead. We implement DNVPL with an in-memory key-value storage prototype, called LogKV. We evaluate it with different workloads. The experiments show that our scheme achieves high update performance from different metrics. Our scheme can reduce update latency by up to 49% and save storage space by 48% compared to the state-of-the-art schemes.
distributed ML addresses challenges from increasing data and model complexities. Peer to peer (P2P) networks in distributed ML offer scalability and fault tolerance. However, they also encounter challenges related to ...
详细信息
ISBN:
(纸本)1577358872
distributed ML addresses challenges from increasing data and model complexities. Peer to peer (P2P) networks in distributed ML offer scalability and fault tolerance. However, they also encounter challenges related to resource consumption, and communication overhead as the number of participating peers grows. This research introduces a novel architecture that combines serverless computing with P2P networks for distributed training. Serverless computing enhances this model with parallel processing and cost effective scalability, suitable for resource-intensive tasks. Preliminary results show that peers can offload expensive computational tasks to serverless platforms. However, their inherent statelessness necessitates strong communication methods, suggesting a pivotal role for databases. To this end, we have enhanced an in memory database to support ML training tasks.
Load forecasting has a significant impact on energy management and planning, facilitating efficient allocation of resources and grid operations. In this study, a comparative analysis of traditional statistical methods...
详细信息
ISBN:
(纸本)9798350369458;9798350369441
Load forecasting has a significant impact on energy management and planning, facilitating efficient allocation of resources and grid operations. In this study, a comparative analysis of traditional statistical methods and deep learning techniques is conducted utilizing a real-world dataset from the Ikaria islanded grid. This paper focuses on four different forecasting approaches: Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average with Exogenous Variables (SARIMAX), Long Short-Term memory (LSTM) networks, and Deep Neural Networks (DNN). Through the appropriate processing of the data, extensive experimentation took place, aiming to capture the complex and nonlinear patterns of the dataset. The results indicated that LSTM and DNN outperformed both ARIMA and SARIMAX in all three evaluation metrics, achieving 0.13, 0.09, and 2.11%, RMSE, MAE, and MAPE, respectively. As a result, this study validates the superiority of deep learning techniques in real-world islanded grid environments being capable of accurately predicting future load values based on historical data.
In order to solve the problem of high memory usage and large GPU computation in the case of a single machine. In this paper, MobileNet and Pytorch deep learning framework and Flink big data computing framework are dee...
详细信息
The commercial adoption of Edge computing (EC) will require pricing schemes that cater to the financial interests of the operators and of the users. Pricing in EC is particularly challenging as it has to take into acc...
详细信息
ISBN:
(纸本)9798350386066;9798350386059
The commercial adoption of Edge computing (EC) will require pricing schemes that cater to the financial interests of the operators and of the users. Pricing in EC is particularly challenging as it has to take into account the limited amount of edge resources as well as the stochasticity of user workloads due to location-specific workload characteristics and differences in user activity. We formulate the problem of maximizing the revenue of a serverless edge operator through dynamically pricing compute and memory resources under time varying workloads as a sequential decision making problem under uncertainty. We provide analytical results for the optimal pricing strategy in a Markovian setting in steady state. For the general case, we propose a novel Generalized Hidden Parameter Markov Decision Process (GHP-MDP) formulation of the revenue maximization problem, and we propose a dual Bayesian neural network approximator as a solution. The key novelty of the proposed solution is that it can be pre-trained on synthetic traces and adapts fast to previously unseen workload characteristics. We use simulations based on synthetic and real traffic traces to show that the proposed solution is sample-efficient thanks to effective transfer learning, and it outperforms state-of-the-art learning approaches in terms of revenue and learning rate by up to 50% on real traces.
Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavil...
详细信息
ISBN:
(纸本)9798350339864
Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
暂无评论