Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly ${\$}120$ /GB, which is dramatically more expensive than reading from exi...
详细信息
Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly ${\$}120$ /GB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by initiating the study of the DNA coverage depth problem, which aims to reduce the required number of reads to retrieve information from the storage system. Under this framework, our main goal is to understand the effect of error-correcting codes and retrieval algorithms on the required sequencing coverage depth. We establish that the expected number of reads that are required for information retrieval is minimized when the channel follows a uniform distribution. We also derive upper and lower bounds on the probability distribution of this number of required reads and provide a comprehensive upper and lower bound on its expected value. We further prove that for a noiseless channel and uniform distribution, MDS codes are optimal in terms of minimizing the expected number of reads. Additionally, we study the DNA coverage depth problem under the random-access setup, in which the user aims to retrieve just a specific information unit from the entire DNA storage system. We prove that the expected retrieval time is at least k for [n,k] MDS codes as well as for other families of codes. Furthermore, we present explicit code constructions that achieve expected retrieval times below k and evaluate their performance through analytical methods and simulations. Lastly, we provide lower bounds on the maximum expected retrieval time. Our findings offer valuable insights for reducing the cost and latency of DNA storage.
We apply automata theory and Karp's minimum mean weight cycle algorithm to minimum density problems in coding theory. Using this method, we find the new upper bound 53/126 approximate to 0:4206 for the minimum den...
详细信息
We apply automata theory and Karp's minimum mean weight cycle algorithm to minimum density problems in coding theory. Using this method, we find the new upper bound 53/126 approximate to 0:4206 for the minimum density of an identifying code on the infinite hexagonal grid, down from the previous record of 3/7 approximate to 0:4286
This paper provides new bounds on the size of spheres in any coordinate-additive metric with a particular focus on improving existing bounds in the sum-rank metric. We derive improved upper and lower bounds based on t...
详细信息
This paper provides new bounds on the size of spheres in any coordinate-additive metric with a particular focus on improving existing bounds in the sum-rank metric. We derive improved upper and lower bounds based on the entropy of a distribution related to the Boltzmann distribution, which work for any coordinate-additive metric. Additionally, we derive new closed-form upper and lower bounds specifically for the sum-rank metric that outperform existing closed-form bounds.
The concept of entropy has played a significant role in thermodynamics and information theory, and is also a current research hotspot. Information entropy, as a measure of information, has many different forms, such a...
详细信息
The concept of entropy has played a significant role in thermodynamics and information theory, and is also a current research hotspot. Information entropy, as a measure of information, has many different forms, such as Shannon entropy and Deng entropy, but there is no unified interpretation of information from a measurement perspective. To address this issue, this article proposes Generalized Information Entropy (GIE) that unifies entropies based on mass function. Meanwhile, GIE establishes the relationship between entropy, fractal dimension, and number of events. Therefore, Generalized Information Dimension (GID) has been proposed, which extends the definition of information dimension from probability to mass fusion. GIE plays a role in approximation calculation and coding systems. In the application of coding, information from the perspective of GIE exhibits a certain degree of particle nature that the same event can have different representational states, similar to the number of microscopic states in Boltzmann entropy.
The (Schur) squares of linear codes are an interesting research topic in coding theory, and they have important applications in cryptography. Linear complementary dual codes (LCD codes) have been widely applied in dat...
详细信息
The (Schur) squares of linear codes are an interesting research topic in coding theory, and they have important applications in cryptography. Linear complementary dual codes (LCD codes) have been widely applied in data storage, communication systems, consumer electronics, and cryptography. Given these exciting applications of squares and LCD codes, we mainly focus on the squares of LCD cyclic codes in this paper. It will be proved that the square of an LCD cyclic code is still an LCD cyclic code. As a subclass of cyclic codes, Bose-Chaudhuri-Hocquenghem codes (BCH codes) have explicit defining sets that include consecutive integers, which gives an advantage of analyzing the parameters of BCH codes and their related codes. We will investigate the squares C-2(t) and C-2(t)(c) of the primitive LCD BCH codes C(t) and their complements C(t)(c) , respectively, where C(t)=C-(q,C-qm-1,C-2t,C--t+1) is the BCH code of length q(m)-1 over F-q with designed distance 2t . Two sufficient and necessary conditions to guarantee that C-2(t)not equalcoding theory and C-2(t)F-c not equal(q)n are proposed by giving restrictions on designed distances. Furthermore, the dimensions and lower bounds on minimum distances of C-2(t) and C-2(t)(c) are presented in some cases. The parameters of the squares of the complements of the Melas codes M(q,m) are also investigated.
Folded Reed-Solomon (FRS) and univariate multiplicity codes are prominent polynomial codes over finite fields renowned for achieving list decoding capacity. These codes have found many applications beyond the traditio...
详细信息
Folded Reed-Solomon (FRS) and univariate multiplicity codes are prominent polynomial codes over finite fields renowned for achieving list decoding capacity. These codes have found many applications beyond the traditional scope of coding theory. In this paper, we introduce improved bounds on the list size for list decoding these codes, achieved through a more streamlined proof method. Additionally, we refine an existing randomized algorithm to output the codewords on the list, which enhances its success probability and reduces its running time. Lastly, we establish list-size bounds for a fixed decoding parameter. Notably, our results demonstrate that FRS codes asymptotically attain the generalized Singleton bound for a list of size 2 over a relatively small alphabet, marking the first explicit instance of a code with this property.
While neural BP-based (NBP) decoders exhibit superior error correction performance compared to belief-propagation (BP) decoders, the NBP decoder's high computational and memory requirements impede its practical de...
详细信息
While neural BP-based (NBP) decoders exhibit superior error correction performance compared to belief-propagation (BP) decoders, the NBP decoder's high computational and memory requirements impede its practical deployment in communication systems. To overcome this challenge, we propose a Coded Neural BP (CNBP) scheme to accelerate the NBP decoder in distributed environments, while considering storage constraints and providing resilience to stragglers. The key idea is to reformulate the primary operations of the NBP decoder as matrix-vector multiplications by introducing weight matrices and transformations. Based on this, the acceleration of the NBP decoder is achieved by speeding up matrix-vector multiplications using coded distributed computing. Extensive experiments conducted on Amazon EC2 cluster demonstrate that CNBP achieves notable acceleration and scalability performance without any loss in error correction performance.
In this paper, we introduce curve-lifted codes over fields of arbitrary characteristic, inspired by Hermitian-lifted codes over F2r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{am...
详细信息
In this paper, we introduce curve-lifted codes over fields of arbitrary characteristic, inspired by Hermitian-lifted codes over F2r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {F}_{2<^>r}$$\end{document}. These codes are designed for locality and availability, and their particular parameters depend on the choice of curve and its properties. Due to the construction, the numbers of rational points of intersection between curves and lines play a key role. To demonstrate that and generate new families of locally recoverable codes (LRCs) with high availabilty, we focus on norm-trace-lifted codes.
The increasing demand for data storage has prompted the exploration of new techniques, with molecular data storage being a promising alternative. In this work, we develop coding schemes for a new storage paradigm that...
详细信息
The increasing demand for data storage has prompted the exploration of new techniques, with molecular data storage being a promising alternative. In this work, we develop coding schemes for a new storage paradigm that can be represented as a collection of two-dimensional arrays. Motivated by error patterns observed in recent prototype architectures, our study focuses on correcting erasures in the last few symbols of each row, and also correcting arbitrary deletions across rows. We present code constructions and explicit encoders and decoders that are shown to be nearly optimal in many scenarios. We show that the new coding schemes are capable of effectively mitigating these errors, making these emerging storage platforms potentially promising solutions.
暂无评论