|
This page contains results for three different corpora: - Standard Calgary Corpus (14 files) from Ian Witten, Tim Bell and John Cleary, - Standard Canterbury Corpus (11 files) and - Large Canterbury Corpus (3 files) from Ross Arnold and Tim Bell.
In contrast to many other comparison pages the presented results are based on compression of single files and not on an archives of files. If you are looking for results of archives looked at the excellent ACT site of Jeff Gilchrist. Here the average compression rate (average of all single compression rates) is listed. The compression rate is measured in bits per symbol (bps) as the quotient of the size of the output in bits to the size of the input in bytes. A value of 8 bps means no compression, smaller values represent better (stronger) compression.
Compression and Decompression Times
|
In order to compare the speed of compression and decompression of the programs, all times are measured in seconds on the same computer. The computer has an Intel Pentium III processor with 735 MHz, running under WINDOWS 2000. For comparison reason the time tables contain also the average compression rate and the total (weighted) compression rate (total sum of the output bit size to total sum of the input byte size).
All programs listed here have published their algorithms either by papers or by source code. For more information follow the links of the programs.
The Calgary Corpus
Compression Rates / bps for the Calgary Corpus (14 files) sorted by Average
|
Program
|
bib
|
book1
|
book2
|
geo
|
news
|
obj1
|
obj2
|
paper1
|
paper2
|
pic
|
progc
|
progl
|
progp
|
trans
|
Average
|
ABC02
|
1.888
|
2.226
|
1.929
|
4.190
|
2.399
|
3.733
|
2.365
|
2.382
|
2.332
|
0.706
|
2.420
|
1.659
|
1.659
|
1.440
|
2.238
|
B00
|
1.901
|
2.257
|
1.950
|
4.156
|
2.393
|
3.739
|
2.410
|
2.394
|
2.335
|
0.720
|
2.439
|
1.652
|
1.630
|
1.427
|
2.243
|
D00
|
1.896
|
2.274
|
1.958
|
4.152
|
2.409
|
3.695
|
2.414
|
2.403
|
2.347
|
0.717
|
2.431
|
1.670
|
1.672
|
1.452
|
2.249
|
YBS02
|
1.915
|
2.218
|
1.921
|
4.487
|
2.385
|
3.919
|
2.444
|
2.372
|
2.313
|
0.710
|
2.438
|
1.674
|
1.684
|
1.467
|
2.282
|
F02
|
1.927
|
2.357
|
2.014
|
4.428
|
2.465
|
3.798
|
2.433
|
2.441
|
2.389
|
0.753
|
2.479
|
1.698
|
1.703
|
1.489
|
2.312
|
BAR
|
2.059
|
2.558
|
2.171
|
4.878
|
2.608
|
4.161
|
2.550
|
2.592
|
2.553
|
0.847
|
2.629
|
1.818
|
1.808
|
1.554
|
2.485
|
GZIP93
|
2.516
|
3.256
|
2.702
|
5.355
|
3.072
|
3.839
|
2.628
|
2.792
|
2.880
|
0.816
|
2.679
|
1.807
|
1.812
|
1.611
|
2.698
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Calgary Corpus (14 files) sorted by Sum
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
GZIP93
|
2.64
|
0.86
|
3.50
|
2.698
|
2.595
|
YBS02
|
2.66
|
1.62
|
4.28
|
2.282
|
1.991
|
ABC02
|
6.06
|
5.67
|
11.73
|
2.238
|
1.977
|
F02
|
14.30
|
4.96
|
19.26
|
2.312
|
2.062
|
The Canterbury Corpus
Compression Rates / bps for the Canterbury Corpus (11 files) sorted by Average
|
Program
|
alice29. txt
|
asyoulik. txt
|
cp. html
|
fields. c
|
grammar. lsp
|
kennedy. xls
|
lcet10. txt
|
plrabn12. txt
|
ptt5
|
sum
|
xargs. 1
|
Average
|
B00
|
2.158
|
2.415
|
2.348
|
2.056
|
2.535
|
0.616
|
1.902
|
2.293
|
0.720
|
2.537
|
3.085
|
2.060
|
ABC02
|
2.160
|
2.419
|
2.359
|
2.078
|
2.505
|
1.058
|
1.888
|
2.249
|
0.706
|
2.486
|
3.074
|
2.089
|
YBS02
|
2.127
|
2.381
|
2.461
|
2.134
|
2.647
|
0.895
|
1.870
|
2.236
|
0.710
|
2.626
|
3.174
|
2.115
|
F02
|
2.228
|
2.483
|
2.410
|
2.097
|
2.548
|
1.243
|
1.968
|
2.361
|
0.753
|
2.572
|
3.098
|
2.160
|
BAR
|
2.379
|
2.666
|
2.580
|
2.323
|
3.031
|
1.449
|
2.133
|
2.562
|
0.847
|
2.862
|
3.541
|
2.398
|
GZIP93
|
2.849
|
3.118
|
2.594
|
2.249
|
2.670
|
1.579
|
2.704
|
3.229
|
0.816
|
2.672
|
3.320
|
2.527
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Canterbury Corpus (11 files) sorted by Sum
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
YBS02
|
2.34
|
1.33
|
3.67
|
2.115
|
1.420
|
GZIP93
|
4.87
|
0.69
|
5.56
|
2.527
|
2.061
|
ABC02
|
3.79
|
3.34
|
7.13
|
2.089
|
1.484
|
F02
|
14.67
|
4.11
|
18.78
|
2.160
|
1.600
|
Compression Rates / bps for the Large Canterbury Corpus (3 files) sorted by Average
|
Program
|
bible.txt
|
e.coli
|
world192.txt
|
Average
|
ABC02
|
1.451
|
1.954
|
1.306
|
1.570
|
B00
|
1.480
|
1.918
|
1.334
|
1.577
|
YBS02
|
1.488
|
1.983
|
1.394
|
1.622
|
F02
|
1.533
|
2.023
|
1.353
|
1.636
|
BAR
|
1.753
|
2.092
|
1.623
|
1.823
|
GZIP93
|
2.330
|
2.244
|
2.337
|
2.304
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Large Canterbury Corpus (3 files) sorted by Sum
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
YBS02
|
11.94
|
6.24
|
18.18
|
1.622
|
1.673
|
GZIP93
|
20.62
|
2.64
|
23.26
|
2.304
|
2.296
|
ABC02
|
19.71
|
12.15
|
31.86
|
1.570
|
1.628
|
F02
|
96.35
|
19.65
|
116.00
|
1.636
|
1.697
|
Logo
|
Program
|
Description
|
|
ABC02
|
ABC V2.2, Author Jürgen Abel, 2002, based on BWT, http://www.data-compression.info/ABC/
|
|
B00
|
Author Bernhard Balkenhol, 2002, based on BWT, http://www.mathematik.uni-bielefeld.de/~bernhard
|
|
BAR
|
Author Frank Jennings, 2004, based on BWT, http://fermatjen.tripod.com/bar
|
|
D00
|
Author Sebastian Deorowicz, 2000, based on BWT, http://sun.iinf.polsl.gliwice.pl/~sdeor
|
|
F02
|
Author Peter Fenwick, 2002, based on BWT, private communication, http://www.cs.auckland.ac.nz/~peter-f
|
|
GZIP93
|
GZIP V1.2.4 with option -9, Author Jean-loup Gailly and Mark Adler, 1993, based on LZ77, http://www.gzip.org
|
|
YBS02
|
YBS, Author Vadim Yoockin, 2002, based on BWT, details in his russian data compression book ISBN 5-86404-170-X, http://compression.graphicon.ru/ybs/
|
Results of other data compression programs are welcome. If you would like to publish the results of your program here, the program must satisfy some requirements.
Compression Rates If you want to publish your compression rates below, all algorithms of the program have to be published, either by papers or by source code.
Compression/Decompression Times In order to publish your compression times here, your program must run under WINDOWS 2000 and all algorithms of the program have to be published, either by papers or by source code.
Please mail your results or your executable program together with a link to the corresponding paper or source code.
|