Researching file formats 20: gzip

This blog post is part of a series on file formats research. See this introduction post for more information.

Update: The official format definition is now online here: GZIP. Comments welcome directly to the Library of Congress.

Am I running out of steam or is there not that much to say about gzip? I think the biggest struggle in writing about the sustainability of this format is ensuring there isn’t conflation between the format itself (compression algorithm, in this case) and the software tool of the same name that creates this format. They are tightly linked together, but different, so I had to make my research notes explicit in every case so it’s easier for when I turn it over to the writer/editor on this project.

A gzip file contains:

I did come across this very thorough StackOverflow answer about gzip (and its relation/difference to other formats of similar names), and the comments are funny, because someone is like “It’d be great if this had cited sources” and the answer, which was written by one of the original authors of gzip, was “I am the reference, having been part of all of that.”