www.data-compression.info The Data Compression Resource on the Internet
Inversion Frequencies (IF)
The Inversion Frequencies Algorithm (IF) is used as a replacement of the Move To Front stage (MTF) within the Burrows-Wheeler Compression algorithm and was introduced by Arnavut and Magliveras in 1997. Its aim is to obtain a output sequence, which is better compressible than the output sequence of the MTF stage. The IF stage scans the input sequence for each alphabet symbol C, and if the current element of the input sequence is equal to C, the number of symbols greater than C between the current index and the index of the last occurrence of C is output. Since the difference of the indices can be greater than 255, the bit size of the output values is usually 20 bit or more. One important property of IF is the fact, that the output values for the last symbol of the alphabet are all 0. Since a sequence of zeroes does not give any information, it can be omited. Therefore, the output sequence of the IF stage is shorter than the input sequence. On the other hand the IF stage needs substantially more overhead than the MTF stage to be transmitted like the symbol distribution or terminator symbols, which makes IF less attractive for small amounts of compressible data.
This article by Ziya Arnavut and Spyros Magliveras from 1997 describes a "Lexical Permutation Sorting Algorithm" (LPSA), which is a generalization of the Burrows-Wheeler Compression Algorithm (BWCA), and a new post-BWT stage, called Inversion Frequencies (IF).
This conference proceeding paper from Ziya Arnavut at DCC 2000 compares the MTF stage with the IF stage. The IF stage, called inversion coding (ranking) technique, yields better compression gain than the recency ranking (MTF coder) for almost all the test data files.
Ziya and Meral Arnavut investigate in this 2004 article special cases of permutations, called multiset permutations, together with a transform different to the BWT, called Linear Ordering Transformation (LOT). LOT sorts rows of a cyclic rotated matrix with respect to the first element of each row.
A paper from 2004 by Ziya Arnavut presents and compares different inversion ranking schemes (IF) with recency ranking schemes (MTF). For large files, image and DNA files, IF achieves better compression rate than MTF schemes.
This publication of Sebastian Deorowicz in "Software-Practice and Experience" from 2002 give a quite complete overview of the post BWT stages used within the Burrows-Wheeler Compression Algorithm. Besides his own Weighted Frequency Count Algorithm and other post BWT stages, Sebastian describes briefly but clearly variantions of the Move To Front scheme, the Inversion Frequencies algorithm and the Distance Coding algorithm with the 3 properties of the newsgroup posting. This is one of my favourite BWCA papers. This BWCA approach achieves a compression rate of 2.25 bps for the Calgary Corpus.
Meral Arnavut is an assistant orofessor at the Department of Mathematical Sciences at the State University of New York, Fredonia, United States of America. Her research interest is in Commutative Algebra, Ring Theory, Number Theory, Combinatorics and Data Compression.
Ziya Arnavut is a associate professor at the State University of New York, Fredonia, United States of America. He published several papers about data compression, particularly lossless image compression. He developed the Inversion Coder, which compresses very well DNA, image, audio and other signal-based data, in addition to large text files. His research interests are in data compression, image processing, multimedia, communications and combinatorial algorithms.
Spyros Magliveras is a professor of Mathematics at the Florida Atlantic University, United States of America. His main research interests are in cryptology, algebraic combinatorics, coding theory, and algorithms and their complexity, particularly those related to computational group theory.