Modern data centers use erasure codes to provide high storage efficiency and fault tolerance. Reed-Solomon code is commonly deployed in large-scale distributed storage systems due to its ease of implementation, but it...
详细信息
Modern data centers use erasure codes to provide high storage efficiency and fault tolerance. Reed-Solomon code is commonly deployed in large-scale distributed storage systems due to its ease of implementation, but it consumes massive bandwidth during node repair. Minimum storage regenerating (MSR) codes is a class of maximum distance separable (mds) codes that achieve the lower bound on repair bandwidth. However, an exponential sub-packetization level is inevitable for MSR codes, resulting in massive disk I/O consumption during node repair. Disk I/O is becoming the bottleneck of the performance in data centers where the storage system needs to frequently provide high-speed data access to clients. In this paper, we consider disk I/O as an important metric to evaluate the performance of a code and construct mds array codes with efficient repair under small sub-packetization level. Specifically, two explicit families of mdscodes with efficient repair are proposed at the sub-packetization level of O(r), where r denotes the number of parities. The first family of codes are constructed over a finite field F-q(m) where q >= n is a prime power, m>r(l-1)+1, n and l denote the code length and sub-packetization level, respectively. The second family of codes are built upon a special binary polynomial ring where the computation operations during node repair and file reconstruction are only XORs and cyclic shifts, avoiding complex multiplications and divisions over large finite fields.
Erasure codes are widely implemented in distributed storage systems to achieve fault tolerance with high storage efficiency. Reed-Solomon code is commonly deployed in data centers due to its optimal storage efficiency...
详细信息
The T-Private Information Retrieval (T-PIR) problem is that a user wishes to retrieve a single record from N servers, without revealing the identity of the desired record even if the number of colluding servers up to ...
详细信息
The T-Private Information Retrieval (T-PIR) problem is that a user wishes to retrieve a single record from N servers, without revealing the identity of the desired record even if the number of colluding servers up to T. Here every server stores all M records and each record can be viewed as a vector over the q-ary finite field F-q. To reduce the field size, a capacity-achieving T-PIR scheme based on some mds array codes for all possible N > T, M >= 2 was constructed by Xu and Zhang in 2019. A key idea in that scheme is to construct some mds array codes in query phase, whose generator matrices satisfy the recovery property (see (C1) in Example 1, page 6), i.e., there exist some specific columns in each generator matrix such that the number of them is equal to the rank of such matrix. We call the set of each column index of those columns in each generator matrix as the column index set. However, that scheme didn't give an explicit selection of the column index set satisfying the recovery property. In this paper, let N = d(2t - 1), N > T = dt > 1, M >= 3 and d >= 1, under the constraint of 2(t-1) >= N, we present a novel method to construct capacity-achieving T-PIR schemes over the binary field with an explicit selection of the column index set. Moreover, our scheme has sub-packetization N(2t - 1)(M-2), which is optimal for all linear capacity-achieving T-PIR schemes determined by Zhang and Xu in 2019. The key to our method is that all locators and column multipliers are precisely selected such that a Generalized Reed-Solomon (GRS) code defined by them has a zero-dimensional subfield sub-code. Compared with all the known capacity-achieving T-PIR schemes for the same non-trivial parameters, ours is the first explicit capacity-achieving T-PIR scheme over the binary field.
暂无评论