www.data-compression.info
The Data Compression Resource on the Internet

Contents

 Comparisons


This page contains results for three different corpora:
 -
Standard Calgary Corpus (14 files) from Ian Witten, Tim Bell and John Cleary,
 -
Standard Canterbury Corpus (11 files) and
 -
Large Canterbury Corpus (3 files) from Ross Arnold and Tim Bell.

 Compression Rates


In contrast to many other comparison pages the presented results are based on compression of single files and not on an archives of files. If you are looking for results of archives looked at the excellent
ACT site of Jeff Gilchrist.
Here the average compression rate (average of all single compression rates) is listed. The compression rate is measured in bits per symbol (bps) as the quotient of the size of the output in bits to the size of the input in bytes. A value of 8 bps means no compression, smaller values represent better (stronger) compression.

 Compression and Decompression Times


In order to compare the speed of compression and decompression of the programs, all times are measured in seconds on the same computer. The computer has an Intel Pentium III processor with 735 MHz, running under WINDOWS 2000.
For comparison reason the time tables contain also the average compression rate and the total (weighted) compression rate (total sum of the output bit size to total sum of the input byte size).

 Algorithms


All programs listed here have published their algorithms either by papers or by source code. For more information follow the links of the programs.

 

 

The Calgary Corpus
The Calgary Corpus

 Compression Rates / bps for the Calgary Corpus (14 files) sorted by Average


Program

bib

book1

book2

geo

news

obj1

obj2

paper1

paper2

pic

progc

progl

progp

trans

Average

ABC02

1.888

2.226

1.929

4.190

2.399

3.733

2.365

2.382

2.332

0.706

2.420

1.659

1.659

1.440

2.238

B00

1.901

2.257

1.950

4.156

2.393

3.739

2.410

2.394

2.335

0.720

2.439

1.652

1.630

1.427

2.243

D00

1.896

2.274

1.958

4.152

2.409

3.695

2.414

2.403

2.347

0.717

2.431

1.670

1.672

1.452

2.249

YBS02

1.915

2.218

1.921

4.487

2.385

3.919

2.444

2.372

2.313

0.710

2.438

1.674

1.684

1.467

2.282

F02

1.927

2.357

2.014

4.428

2.465

3.798

2.433

2.441

2.389

0.753

2.479

1.698

1.703

1.489

2.312

BAR

2.059

2.558

2.171

4.878

2.608

4.161

2.550

2.592

2.553

0.847

2.629

1.818

1.808

1.554

2.485

GZIP93

2.516

3.256

2.702

5.355

3.072

3.839

2.628

2.792

2.880

0.816

2.679

1.807

1.812

1.611

2.698

 Compression and Decompression Times / sec and Average and Total Compression Rates / bps
 for the Calgary Corpus (14 files) sorted by Sum


Program

Total Compression
Time

Total Decompression
Time

Sum of Compr. and
Decompr. Time

Average Compression
Rate

Total Compression
Rate

GZIP93

2.64

0.86

3.50

2.698

2.595

YBS02

2.66

1.62

4.28

2.282

1.991

ABC02

6.06

5.67

11.73

2.238

1.977

F02

14.30

4.96

19.26

2.312

2.062

 

 

The Canterbury Corpus
The Canterbury Corpus

 Compression Rates / bps for the Canterbury Corpus (11 files) sorted by Average


Program

alice29.
txt

asyoulik.
txt

cp.
html

fields.
c

grammar.
lsp

kennedy.
xls

lcet10.
txt

plrabn12.
txt

ptt5

sum

xargs.
1

Average

B00

2.158

2.415

2.348

2.056

2.535

0.616

1.902

2.293

0.720

2.537

3.085

2.060

ABC02

2.160

2.419

2.359

2.078

2.505

1.058

1.888

2.249

0.706

2.486

3.074

2.089

YBS02

2.127

2.381

2.461

2.134

2.647

0.895

1.870

2.236

0.710

2.626

3.174

2.115

F02

2.228

2.483

2.410

2.097

2.548

1.243

1.968

2.361

0.753

2.572

3.098

2.160

BAR

2.379

2.666

2.580

2.323

3.031

1.449

2.133

2.562

0.847

2.862

3.541

2.398

GZIP93

2.849

3.118

2.594

2.249

2.670

1.579

2.704

3.229

0.816

2.672

3.320

2.527

 Compression and Decompression Times / sec and Average and Total Compression Rates / bps
 for the Canterbury Corpus (11 files) sorted by Sum


Program

Total Compression
Time

Total Decompression
Time

Sum of Compr. and
Decompr. Time

Average Compression
Rate

Total Compression
Rate

YBS02

2.34

1.33

3.67

2.115

1.420

GZIP93

4.87

0.69

5.56

2.527

2.061

ABC02

3.79

3.34

7.13

2.089

1.484

F02

14.67

4.11

18.78

2.160

1.600

 Compression Rates / bps for the Large Canterbury Corpus (3 files) sorted by Average


Program

bible.txt

e.coli

world192.txt

Average

ABC02

1.451

1.954

1.306

1.570

B00

1.480

1.918

1.334

1.577

YBS02

1.488

1.983

1.394

1.622

F02

1.533

2.023

1.353

1.636

BAR

1.753

2.092

1.623

1.823

GZIP93

2.330

2.244

2.337

2.304

 Compression and Decompression Times / sec and Average and Total Compression Rates / bps
 for the Large Canterbury Corpus (3 files) sorted by Sum


Program

Total Compression
Time

Total Decompression
Time

Sum of Compr. and
Decompr. Time

Average Compression
Rate

Total Compression
Rate

YBS02

11.94

6.24

18.18

1.622

1.673

GZIP93

20.62

2.64

23.26

2.304

2.296

ABC02

19.71

12.15

31.86

1.570

1.628

F02

96.35

19.65

116.00

1.636

1.697

 

 

 Program Information


Logo

Program

Description

ABC02

ABC02
 

ABC V2.2, Author Jürgen Abel, 2002, based on BWT, http://www.data-compression.info/ABC/
 

B00

B00
 

Author Bernhard Balkenhol, 2002, based on BWT, http://www.mathematik.uni-bielefeld.de/~bernhard
 

BAR

BAR
 

Author Frank Jennings, 2004, based on BWT,
http://fermatjen.tripod.com/bar
 

D00

D00
 

Author Sebastian Deorowicz, 2000, based on BWT,
http://sun.iinf.polsl.gliwice.pl/~sdeor
 

F02

F02
 

Author Peter Fenwick, 2002, based on BWT, private communication, http://www.cs.auckland.ac.nz/~peter-f
 

GZIP93

GZIP93
 

GZIP V1.2.4 with option -9, Author Jean-loup Gailly and Mark Adler, 1993, based on LZ77, http://www.gzip.org
 

YBS02

YBS02
 

YBS, Author Vadim Yoockin, 2002, based on BWT,
details in his russian data compression book ISBN 5-86404-170-X,
http://compression.graphicon.ru/ybs/
 

 More Results


Results of other data compression programs are welcome. If you would like to publish the results of your program here, the program must satisfy some requirements.

Compression Rates
If you want to publish your compression rates below, all algorithms of the program have to be published, either by papers or by source code.

Compression/Decompression Times
In order to publish your compression times here, your program must run under WINDOWS 2000 and all algorithms of the program have to be published, either by papers or by source code.

Please mail your results or your executable program together with a link to the corresponding paper or source code.
 

 

Copyright © 2002-2022 Dr.-Ing. Jürgen Abel, Lechstraße 1, 41469 Neuß, Germany. All rights reserved.