Truepolyglot is polyglot file generator project. It means the generated file is composed of several file formats. The same file can be opened as a ZIP file and as a PDF file for example. The idea of this project comes from the work of Ange Albertini, International Journal of Proof-of-Concept or Get The Fuck Out and Julia Wolf that explain how we can build a polyglot file.
Polyglot file can be boring to build, even more if you want to respect the file format correctly.
That's why I decided to build a tool to generate them.
My main motivation was the technical challenge.
This repository is forked from truepolyglot.hackade.org and includes a few commits to
provide a setup.py
for pip
-installations along with a number of other opinionated changes.
You can install this version from master
with:
pip install git+https://github.com/ansemjo/truepolyglot
Notably, this fork uses PyPDF2's cloneReaderDocumentRoot
, which may hiccup on malformed PDFs more easily but
copies the entire document including cross-references and section labels. The setup.py
also installs a command
pdfzip
, which only creates polyglot files of this particular format since I believe this to be the most useful
output format:
pdfzip -p document.pdf -z archive.zip polyglot.zip.pdf
Below you find the rest of the original README. Parts of it may be outdated and may not apply to this fork. For example I did not test compatability beyond Firefox and Evince.
Description | Version |
---|---|
Build a polyglot file valid as PDF and ZIP format and that can be opened with 7Zip and Windows Explorer | POC |
Add a stream object in the PDF part | POC |
Polyglot file checked without warning with pdftocairo | >= 1.0 |
Polyglot file checked without warning with caradoc | >= 1.0 |
Rebuild the PDF Xref Table | >= 1.0 |
Stream object with the correct length header value | >= 1.0 |
Add the format "zippdf", file without offset after the Zip data | >= 1.1 |
Polyglot file keeps the original PDF version | >= 1.1.1 |
Add the "szippdf" format without offset before and after the Zip data | >= 1.2 |
Fix /Length stream object value and the PDF offset for the szippdf format | >= 1.2.1 |
PDF object numbers reorder after insertion | >= 1.3 |
Add the format "pdfany" a valid PDF with custom payload content in the first and the last objet | >= 1.5.2 |
Add "acrobat-compatibility" option to allow szippdf to be read with Acrobat Reader (thanks Ange Albertini) | >= 1.5.3 |
Add the format "zipany" a valid ZIP with custom payload content at the start and between LHF and CD | >= 1.6 |
Software | Formats | status |
---|---|---|
Acrobat Reader | pdfzip, zippdf, szippdf, pdfany | OK |
Sumatra PDF | pdfzip, zippdf, szippdf, pdfany | OK |
Foxit PDF Reader | pdfzip, zippdf, szippdf, pdfany | OK |
Edge | pdfzip, zippdf, szippdf, pdfany | OK |
Firefox | pdfzip, zippdf, szippdf, pdfany | OK |
7zip | pdfzip, zippdf, zipany | OK with warning |
7zip | szippdf | OK |
Explorer Windows | pdfzip, zippdf, szippdf, pdfany, zipany | OK |
Info-ZIP (unzip) | pdfzip, zippdf, szippdf, pdfany, zipany | OK |
Evince | pdfzip, zippdf, szippdf, pdfany | OK |
pdftocairo -pdf | pdfzip, zippdf, szippdf, pdfany | OK |
caradoc stats | pdfzip, pdfany | OK |
java -jar | szippdf | OK |
First input file | Second input file | Format | Polyglot | Comment |
---|---|---|---|---|
doc.pdf | archive.zip | pdfzip | polyglot.pdf | PDF/ZIP polyglot - 122 Ko |
orwell_1984.pdf | file-FILE5_32.zip | pdfzip | polyglot.pdf | PDF/ZIP polyglot - 1.3 Mo |
x86asm.pdf | fasmw17304.zip | pdfzip | polyglot.pdf | PDF/ZIP polyglot - 1.8 Mo |
doc.pdf | archive.zip | zippdf | polyglot.pdf | PDF/ZIP polyglot - 112 Ko |
electronics.pdf | hello_world.jar | szippdf | polyglot.pdf | PDF/JAR polyglot - 778 Ko |
hexinator.pdf | eicar.zip (scan virustotal.com) | pdfzip | polyglot.pdf (scan virustotal.com) | PDF/ZIP polyglot with the Eicar test in Zip - 2.9 Mo |
doc.pdf | page.html | pdfany | polyglot.pdf | PDF/HTML polyglot - 26 Ko |
logo.zip | nc.exe | zipany | polyglot.zip | PDF/PE polyglot - 96 Ko |
usage: truepolyglot format [options] output-file
Generate a polyglot file.
Formats availables:
* pdfzip: Generate a file valid as PDF and ZIP. The format is closest to PDF.
* zippdf: Generate a file valid as ZIP and PDF. The format is closest to ZIP.
* szippdf: Generate a file valid as ZIP and PDF. The format is strictly a ZIP. Archive is modified.
* pdfany: Generate a valid PDF file with payload1 file content as the first object or/and payload2 file content as the last object.
* zipany: Generate a valid ZIP file with payload1 file content at the start of the file or/and payload2 file content between LFH and CD.
positional arguments: {pdfzip,zippdf,szippdf,pdfany,zipany}
Output polyglot format
output_file Output polyglot file path
optional arguments:
-h, --help show this help message and exit
--pdffile PDFFILE PDF input file
--zipfile ZIPFILE ZIP input file
--payload1file PAYLOAD1FILE Payload 1 input file
--payload2file PAYLOAD2FILE Payload 2 input file
--acrobat-compatibility Add a byte at the start for Acrobat Reader compatibility with the szippdf format
--verbose {none,error,info,debug} Verbosity level (default: info)
TruePolyglot v1.6.2
git clone https://git.hackade.org/truepolyglot.git/
or download truepolyglot-1.6.2.tar.gz
You can use binwalk on a file to see if composed of multiple files.
Copyright © 2018-2019 ben@hackade.org
TruePolyglot is released under Unlicence except for the following libraries: