This database is a repository for various corpi for benchmarking the performance of compression-algorithms.
There exist various benchmarking-corpora for compressors
in the internet, used by most of the researchers in the
field of data-compression. But they mostly publish their
results in the papers only.
This makes it hard - for both: professionals and
amateurs - to compare their results and verify effectivity
against those of others, also because they are mostly
not aware of their existance.
I (for example) still miss corpora for a variety of
both - interesting applications and advancing technologies:
These are very special topics, but considered to be
widely in use (memory/processing-power increase) in the
near future.
But also for some aspects of today are nearly inrepresent:
Especially for the latter I created some personal corpora
that finally may lead to some official datasets for those
areas of interest.
Another interesting but not crutial aspect is, that it's
nearly impossible to get a global overview of changes
and creations of compression-algorithms over time. For
example for now no clear impact of the introduction of
wavelet-algorithms on image-compression can be created
or proofed.
So this are some reasons that lead to the creation of this centralized repository of results for any kind of compression over a broad range of well known and accepted as well as some experimental and personal corpora.
Before you use this system you should read the Terms of use. Mention the origin, help this site, make your implementation free. :-) If you can verify tests, please drop me a mail and I am going to strengthen the authenticity of the results through mentioning your confirmation.
TODO: There is always much to complete, and much left incomplete. Please be patient if not everything showing up, works (like the profiles in the bar to the left). A major addition I want to make within the next year, is to mention all papers available that do their evaluation on the files mentioned here.
Select one of the following Corpora to receive more informations and refine your results: