We consider the problem of evaluating arbitrary multivariate polynomials over a massive dataset, in a distributedcomputing system with a master node and multiple worker nodes. Generalized Lagrange codedcomputing (GL...
详细信息
ISBN:
(纸本)9781665421607;9781665421591
We consider the problem of evaluating arbitrary multivariate polynomials over a massive dataset, in a distributedcomputing system with a master node and multiple worker nodes. Generalized Lagrange codedcomputing (GLCC) codes are proposed to provide robustness against stragglers who do not return computation results in time, adversarial workers who deliberately modify results for their benefit, and information-theoretic security of the dataset amidst possible collusion of workers. GLCC codes are constructed by first partitioning the dataset into multiple groups, and then encoding the dataset using carefully designed interpolation polynomials, such that interference computation results across groups can be eliminated at the master. Particularly, GLCC codes include the state-of-theart Lagrange codedcomputing (LCC) codes as a special case, and achieve a more flexible tradeoff between communication and computation overheads in optimizing system efficiency.
The modern information age is heralded by exciting paradigms that generate a tremendous amount of data, such as health records, financial transactions, and social media user information. As information becomes increas...
详细信息
The modern information age is heralded by exciting paradigms that generate a tremendous amount of data, such as health records, financial transactions, and social media user information. As information becomes increasingly available, distributedcomputing becomes more and more important, especially in information retrieval, search, and computation. The high demand for distributedcomputing brings many new challenges and concerns. This dissertation focuses on privacy, security, and flexibility issues in distributedcomputing from an information-theoretic perspective. Specifically, we study the privacy problem via private information retrieval (PIR) with a focus on the scenarios with available side information and private search. The security and flexibility problems are investigated via coded distributed computing. Two settings for matrix multiplication are studied: the secure multi-party computation and the distributed computation with a flexible number of available servers. PIR allows a user to retrieve one message from databases while preserving the privacy of the desired message index. We investigate the role of side information in PIR, meaning a subset of the messages known by the user. Assume T is the number of possible colluding databases, despite which the privacy of the desired message and the side information should still be maintained. We focus on T-Private Information Retrieval with Private Side Information (TPIR-PSI) and characterize its capacity. A novel achievable scheme is proposed. As a special case obtained by setting T = 1, our result settles the capacity of PIR-PSI, an open problem previously noted by Kadhe et al. We extend our results to the problem of symmetric-TPIR with private side information (STPIR-PSI). Then, we consider private search, where a user searches for all records matching a symbol from the servers, without revealing any information about the queried symbol. Private search is shown to be highly related to PIR with arbitrarily depe
Codes that possess combination property (CP) and zigzag decoding (ZD) simultaneously (CP-ZD) has broad application into edge aided distributed systems, including distributed storage, coded distributed computing (CDC),...
详细信息
Codes that possess combination property (CP) and zigzag decoding (ZD) simultaneously (CP-ZD) has broad application into edge aided distributed systems, including distributed storage, coded distributed computing (CDC), and CDC-structured distributed training. Existing CP-ZD code designs are based on scalar code, where one node stores exactly one encoded packet. The drawback is that the induced overhead is high. In order to significantly reduce the overhead, vector CP-ZD codes are designed, where vector means the number of stored encoded packets in one node is extended from one to multiple. More specifically, in detailed code construction, cyclic shift is proposed, and the shifts are carefully designed for cases that each node stores two, three, and four packets, respectively. Comparisons show that the overhead is reduced significantly.
The coded distributed computing(CDC) proposed by Li et *** a high-efficiency method to decrease the communication load in most distributedcomputing ***,as there are more and more nodes,the numbers of output functio...
详细信息
The coded distributed computing(CDC) proposed by Li et *** a high-efficiency method to decrease the communication load in most distributedcomputing ***,as there are more and more nodes,the numbers of output functions and input files in these CDC scenarios grow too fast to be applied in *** the course of our work,we make an attempt to significantly reduce the minimum requirement of N and Q,where N denote the number of input files and Q denote the number output functions in our proposed CDC *** results indicate that the minimum requirement of Q in our new CDC scheme is only a factor of the total number of computing nodes K,and the minimum requirement of N is far less than that of the optimal scheme provided by Li et al.,while(L/(L)≤1.03125,where L and L are the communication loads of the Li scheme provided by Li et *** our derived CDC scheme,respectively.
暂无评论