Math-Info project carried out in 2019, as part of studies in the engineering preparatory cycle. The aim of the project is to reduce the size of the files, as much as possible, as desired using the Huffman method.
All scripts are written in UTF-8, the docstring and comments are in UTF-8 comply with the standard PEP 484 as follows than the standard for Docstrings Google.
All the libraries used are native.
Implement a compression algorithm, recursively, reducing the size of the files. With the Huffman method.
It is possible to decompress previously compressed files. Always by the Huffman method.
To do this, open the Main.py
file in a terminal.
A graphical window opens :
You can choose between compressing or decompressing a document.
Only .hf
files can be unzipped. This is an extension created for de Huffman compression.
To compress a file, of any size or type, click on the Compression
button. And choose the file you are interested in.
You can then choose the directory and name of your file to be saved. Here logo.hf
.
When the file compression is complete, you will hear an audible alert (Windows only). And you see the statistics of the compressed file.
So we have :
- The name of file to be compressed
- Size in bytes
- Number of different bytes in the file
- The Huffman tree's construction time
- The size of the file after compression
- Byte gain in percent
- Entropy in base 2 designates the minimum number of bits per symbol that our information takes
- Average byte size after compression
- Writing time of the file
File name | File size (bytes) | Different number of bytes | Gain | Entropy |
---|---|---|---|---|
fichier_1.txt | 10 347 | 38 | 41,49% | 4,10 |
fichier_2.txt | 20 597 | 42 | 44,57% | 4,10 |
fichier_3.txt | 102 843 | 42 | 47,71% | 4,10 |
fichier_4.txt | 1 028 560 | 42 | 48,42% | 4,10 |
fichier_5.txt | 10 285 870 | 42 | 48,49% | 4,10 |
fichier_6.txt | 102 864 419 | 42 | 48,50% | 4,10 |
These are only examples of the results of file compression according to their size. Depending on the files these results change. As a general rule, the larger the file, the greater the byte gain. Up to 90%. But the operation takes longer.
As before, click on the Décompression
button. Once you have selected the file you are interested in, save it where you want.
Only .hf
files can be unzipped and are displayed in the file explorer.
When the file decompression is complete, you will hear an audible alert (Windows only). And you see the statistics of the compressed file.
So we have :
- The name of the file to decompress
- The name of the file decompressed
- Number of bytes of the dictionary, where the data of the binary aber is stored.
- The size of the file after decompression
- Execution time
File name | Dictionary size (bytes) |
---|---|
fichier_1.txt | 690 |
fichier_2.txt | 793 |
fichier_3.txt | 787 |
fichier_4.txt | 788 |
fichier_5.txt | 785 |
fichier_6.txt | 787 |
These are only examples of the results of file decompression according to their size. Depending on the files these results change.
- The code is in French 🇫🇷
- It is not possible to compress folders, only files
- We choose to have a strong compression but an important execution time
- The binary tree is specific to each file. It increases the execution time, but is more efficient.
- The binary tree is saved in the compressed file. It increases its size
- Little or no efficiency with small files for the reasons mentioned above