In the era of big data, more and more applications require the information of historical data to support rich analytics, learning, and mining operations. In these cases, it is highly desirable to retrieve information ...
详细信息
ISBN:
(纸本)9781450383431
In the era of big data, more and more applications require the information of historical data to support rich analytics, learning, and mining operations. In these cases, it is highly desirable to retrieve information of previous versions of data. Traditionally, multi-version databases can be used to store all historical values of the data in order to support historical queries. However, storing all the historical data can be impractical due to its large space consumption. In this paper, we propose the concept of at-the-time persistent (ATTP) and back-in-time persistent (BITP) sketches, which are sketches that approximately answer queries on previous versions of data with small space. We then provide several implementations of ATTP/BITP sketches which are shown to be more efficient compared to existing state-of-the-art solutions in our empirical studies.
The datastructures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Mac...
详细信息
The datastructures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection datastructures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala's and Clojure's datastructure implementations in terms of memory footprint and runtime efficiency of iteration (1.3-6.7 x) and equality checking (3-25.4 x).
暂无评论