|
|
|
|
|
|
This page contains results for three different corpora: - Standard Calgary Corpus (14 files) from Ian Witten, Tim Bell and John Cleary, - Standard Canterbury Corpus (11 files) and - Large Canterbury Corpus (3 files) from Ross Arnold and Tim Bell.
In contrast to many other comparison pages the presented results are based on compression of single files and not on an archives of files. If you are looking for results of archives looked at the excellent ACT site of Jeff Gilchrist. Here the average compression rate (average of all single compression rates) is listed. The compression rate is measured in bits per symbol (bps) as the quotient of the size of the output in bits to the size of the input in bytes. A value of 8 bps means no compression, smaller values represent better (stronger) compression.
|
Compression and Decompression Times
|
In order to compare the speed of compression and decompression of the programs, all times are measured in seconds on the same computer. The computer has an Intel Pentium III processor with 735 MHz, running under WINDOWS 2000. For comparison reason the time tables contain also the average compression rate and the total (weighted) compression rate (total sum of the output bit size to total sum of the input byte size).
All programs listed here have published their algorithms either by papers or by source code. For more information follow the links of the programs.
The Calgary Corpus
|
Compression Rates / bps for the Calgary Corpus (14 files) sorted by Average
|
|
Program
|
bib
|
book1
|
book2
|
geo
|
news
|
obj1
|
obj2
|
paper1
|
paper2
|
pic
|
progc
|
progl
|
progp
|
trans
|
Average
|
|
ABC02
|
1.888
|
2.226
|
1.929
|
4.190
|
2.399
|
3.733
|
2.365
|
2.382
|
2.332
|
0.706
|
2.420
|
1.659
|
1.659
|
1.440
|
2.238
|
|
B00
|
1.901
|
2.257
|
1.950
|
4.156
|
2.393
|
3.739
|
2.410
|
2.394
|
2.335
|
0.720
|
2.439
|
1.652
|
1.630
|
1.427
|
2.243
|
|
D00
|
1.896
|
2.274
|
1.958
|
4.152
|
2.409
|
3.695
|
2.414
|
2.403
|
2.347
|
0.717
|
2.431
|
1.670
|
1.672
|
1.452
|
2.249
|
|
YBS02
|
1.915
|
2.218
|
1.921
|
4.487
|
2.385
|
3.919
|
2.444
|
2.372
|
2.313
|
0.710
|
2.438
|
1.674
|
1.684
|
1.467
|
2.282
|
|
F02
|
1.927
|
2.357
|
2.014
|
4.428
|
2.465
|
3.798
|
2.433
|
2.441
|
2.389
|
0.753
|
2.479
|
1.698
|
1.703
|
1.489
|
2.312
|
|
BAR
|
2.059
|
2.558
|
2.171
|
4.878
|
2.608
|
4.161
|
2.550
|
2.592
|
2.553
|
0.847
|
2.629
|
1.818
|
1.808
|
1.554
|
2.485
|
|
GZIP93
|
2.516
|
3.256
|
2.702
|
5.355
|
3.072
|
3.839
|
2.628
|
2.792
|
2.880
|
0.816
|
2.679
|
1.807
|
1.812
|
1.611
|
2.698
|
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Calgary Corpus (14 files) sorted by Sum
|
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
|
GZIP93
|
2.64
|
0.86
|
3.50
|
2.698
|
2.595
|
|
YBS02
|
2.66
|
1.62
|
4.28
|
2.282
|
1.991
|
|
ABC02
|
6.06
|
5.67
|
11.73
|
2.238
|
1.977
|
|
F02
|
14.30
|
4.96
|
19.26
|
2.312
|
2.062
|
The Canterbury Corpus
|
Compression Rates / bps for the Canterbury Corpus (11 files) sorted by Average
|
|
Program
|
alice29. txt
|
asyoulik. txt
|
cp. html
|
fields. c
|
grammar. lsp
|
kennedy. xls
|
lcet10. txt
|
plrabn12. txt
|
ptt5
|
sum
|
xargs. 1
|
Average
|
|
B00
|
2.158
|
2.415
|
2.348
|
2.056
|
2.535
|
0.616
|
1.902
|
2.293
|
0.720
|
2.537
|
3.085
|
2.060
|
|
ABC02
|
2.160
|
2.419
|
2.359
|
2.078
|
2.505
|
1.058
|
1.888
|
2.249
|
0.706
|
2.486
|
3.074
|
2.089
|
|
YBS02
|
2.127
|
2.381
|
2.461
|
2.134
|
2.647
|
0.895
|
1.870
|
2.236
|
0.710
|
2.626
|
3.174
|
2.115
|
|
F02
|
2.228
|
2.483
|
2.410
|
2.097
|
2.548
|
1.243
|
1.968
|
2.361
|
0.753
|
2.572
|
3.098
|
2.160
|
|
BAR
|
2.379
|
2.666
|
2.580
|
2.323
|
3.031
|
1.449
|
2.133
|
2.562
|
0.847
|
2.862
|
3.541
|
2.398
|
|
GZIP93
|
2.849
|
3.118
|
2.594
|
2.249
|
2.670
|
1.579
|
2.704
|
3.229
|
0.816
|
2.672
|
3.320
|
2.527
|
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Canterbury Corpus (11 files) sorted by Sum
|
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
|
YBS02
|
2.34
|
1.33
|
3.67
|
2.115
|
1.420
|
|
GZIP93
|
4.87
|
0.69
|
5.56
|
2.527
|
2.061
|
|
ABC02
|
3.79
|
3.34
|
7.13
|
2.089
|
1.484
|
|
F02
|
14.67
|
4.11
|
18.78
|
2.160
|
1.600
|
|
Compression Rates / bps for the Large Canterbury Corpus (3 files) sorted by Average
|
|
Program
|
bible.txt
|
e.coli
|
world192.txt
|
Average
|
|
ABC02
|
1.451
|
1.954
|
1.306
|
1.570
|
|
B00
|
1.480
|
1.918
|
1.334
|
1.577
|
|
YBS02
|
1.488
|
1.983
|
1.394
|
1.622
|
|
F02
|
1.533
|
2.023
|
1.353
|
1.636
|
|
BAR
|
1.753
|
2.092
|
1.623
|
1.823
|
|
GZIP93
|
2.330
|
2.244
|
2.337
|
2.304
|
|
Compression and Decompression Times / sec and Average and Total Compression Rates / bps for the Large Canterbury Corpus (3 files) sorted by Sum
|
|
Program
|
Total Compression Time
|
Total Decompression Time
|
Sum of Compr. and Decompr. Time
|
Average Compression Rate
|
Total Compression Rate
|
|
YBS02
|
11.94
|
6.24
|
18.18
|
1.622
|
1.673
|
|
GZIP93
|
20.62
|
2.64
|
23.26
|
2.304
|
2.296
|
|
ABC02
|
19.71
|
12.15
|
31.86
|
1.570
|
1.628
|
|
F02
|
96.35
|
19.65
|
116.00
|
1.636
|
1.697
|
| |