Burrows-Wheeler with Inversion Coder (BWIC)Note: Previously, BWIC was called BSIC. However, to give more attention to Burrows-Wheeler's excellent work, I changed the name BSIC to BWIC. |
The Move-to-Front (or Recency Ranking) has beeen introduced in 1986 by Benetley et. al. (also, discovered independently by Elias). The MTF coder ha s been especially popular since the introduction of the Block-Sorting coder by Bur rows and Wheeler in 1994, which opened new frontiers in data compression. We introduced a new coder, called Inversion Coder (or Inversion Ranks). Inversion coder yields better compression than MTF-based coders on most of the f iles and on large text files. Our extensive simulation results have shown that a Inversion-based Coding technique is superior in compressing DNA, Image, Audio and other Signal-based data, in addition to large text files. It simply yields better compression than MTF-based coders. The compression gain with respect to MTF-based coders in terms of weighted average bit per symbol is in the range of 4-12% for the test files utilized. We have utilized a varity of test files. Some of the test files can be obtained here for testing/research purposes. For s ome others, we provide links.
As we keep improving our BWIC, we will update this
page! At this moment, for research/testing purposes, we are releasing the
BWIC's executables which runs on a
Context-based arithmetic coder. Hence, it requires large memory.
With the current
binaries you can compress
files of any size. The block (window) size for BWT is set to 8-MB. Experimentally, we determined that
8 MB gives us good compression and and speed. Hence, for files bigger than 8-MB,
compression is achieved by processing the file in chuncks of 8-MB. You may run the
executables on a Linux or Windows OS
as follows:
ubwic compressed-file output-filename
The people who are involved, contributed or helped us in this project
are:
My friend David Leavitt, and Fredonia Computer Science students Jeffrey Jones, Ryan Smith and Eric Penoyer.
You can obtain the current test version of the BWIC encoder and decoder for
LINUX in
in compressed form
by clicking the link
BWIC and
UBWIC . To obtain executables for
MS Windows OS (32-bit version),
click onto the following links
BWIC and
UBWIC . For windows, encoder and decoder are uncompressed.For 64-bit MS Windows OS, please use BWIC.exe. When using BWIC.exe, to encode a raw file type
BWIC -e inputfile outfile and to decode a compressed file with BWIC, type
BWIC -d compressedfile decompressedfile .
You can also obtain some of test files used in our work. All the
DNA test files are tarred and
compressed with gzip.
Similarly all the Image and
Sound files utilized in
our work are tarred and compressed with gzip. Other test files used in
our work are from Calgary and Cantarbury Corpuses.
You can acess to the files or web sites of these corpuses by
clicking calgary.tar.gz
or visiting the link
Canterburry Corpus .
|