An xml document D often has a regular structure, i. e., it is composed of many similarly named and structured subtrees. Therefore, the entropy of a trees structuredness should be relatively low and thus the trees shou...
详细信息
ISBN:
(纸本)9783319270302;9783319270296
An xml document D often has a regular structure, i. e., it is composed of many similarly named and structured subtrees. Therefore, the entropy of a trees structuredness should be relatively low and thus the trees should be highly compressible by transforming them to an intermediate form. In general, this idea is used in permutationbasedxml-conscious compressors. An example of such a compressor is called XSAQCT, where the compressible form is called an annotated tree. While XSAQCT proved to be useful for various applications, it was never shown that it is a lossless compressor. This paper provides the formal background for the definition of an annotated tree, and a formal proof that the compression is lossless. It also shows properties of annotated trees that are useful for various applications, and discusses a measure of compressibility using this approach, followed by the experimental results showing compressibility of annotated trees.
暂无评论