Performance of Generalized Deduplication Under Different Input Conditions

Conference: European Wireless 2021 - 26th European Wireless Conference
11/10/2021 - 11/12/2021 at Verona, Italy

Proceedings: European Wireless 2021

Pages: 7Language: englishTyp: PDF

Authors:
Talasila, Prasad; Lucani, Daniel (Department of ECE, Aarhus University, Denmark)

Abstract:
In the era of exponential growth in data, compression of data is needed to limit the growth in the required storage space and supporting infrastructure. Recently, generalized deduplication has been proposed as a better way of performing data deduplication. We use Golomb Rice transform for providing generalized deduplication. With this setup, we evaluate the performance of generalized deduplication for the input conditions modeled using binomial, geometric, Poisson and uniform distributions. We derive the closed form expressions for pmf of the data after generalized deduplication. For all the input conditions, generalized deduplicaton transforms the input probability mass function (pmf) into a new pmf that is highly suitable for deduplication, reduces size of the deduplication table, provides comparable compression gain for fewer number of input chunks.