Researching file formats 33: Android Package

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: APK. Comments welcome directly to the Library of Congress. First off, this work was made incredibly easier thanks to the work of Johan van der Knijff, particularly the blog series around working with mobile apps, the latest being “Towards a preservation workflow for mobile apps” from 2021,...
Read more

Researching file formats 27: Core Audio Format

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Apple Core Audio Format. Comments welcome directly to the Library of Congress. Core Audio Format (CAF) is a file format for storing and transporting digital audio data. It’s published by Apple. Fully disclosed, thoroughly, with helpful contextual statements. It was introduced in 2005. I am a bit...
Read more

Researching file formats 31: XYZ Point Cloud

This blog post is part of a series on file formats research. See this introduction post for more information. After a handful of difficult and technically (or socially/politically) complex formats, it was nice to work on this simple (if unspecified) format, taking me back to one of the earliest formats I worked on, FASTA. This format originally belonged in that initial category (see RGBE Image Format), but there was a mix-up. While it is structurally...
Read more

Researching file formats 30: Autodesk Maya Binary

This blog post is part of a series on file formats research. See this introduction post for more information. You know, I saved a spot for this format but it really is essentially the same as last week’s, except stored in a binary format instead of a text-based one. I checked to see if it was simply gzipped, but it wasn’t – it’s its own secret-sauce thing courtesy of Autodesk (as usual, comments welcome, maybe...
Read more

Researching file formats 29: Autodesk Maya Project

This blog post is part of a series on file formats research. See this introduction post for more information. This format described 3D scenes – geometry, lighting, animation, rendering, other stuff. It’s fairly straightforward, and I felt the documentation was relatively thorough while being succinct. Preservation issues seem to mostly be around concerns with this file needing to be connected to other files: Here’s what the specification has to say about this format: “If you...
Read more

Researching file formats 28: Virtual Reality Modeling Language

This blog post is part of a series on file formats research. See this introduction post for more information. I was getting so overwhelmed surfing the old web for this format. It was absolutely thrilling but exhausting too, like when you’re on your 13th day of vacation and you’re so worn out but there’s so much more to see. Nothing is more exciting to me than looking into VIRTUAL WORLDS. I felt like it looked...
Read more

Researching file formats 27: Microstation DGN

This blog post is part of a series on file formats research. See this introduction post for more information. This format was really a family of formats, and there’s specifically a break between an older version of the format, documented, and a later version of the format, semi-documented. At the time of writing this blog post, I still have some work to do in sorted out all the details and nuances here, when there’s not...
Read more

Researching file formats 26: 3DM

This blog post is part of a series on file formats research. See this introduction post for more information. The next six formats are part of Set 3: 3D, VR and Animation! All of these formats are quite different from each other (except two closely related, which will be obvious). First up: 3DM. Rhino 3D Model file format family. Or the openNURBS 3D model? Something important to note off the top is that openNURBS is...
Read more

Researching file formats 25: Nullsoft Streaming Video

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Nullsoft Streaming Video. Comments welcome directly to the Library of Congress. “Support for more codecs will be added soon.” – Nullsoft, 2004 Last week was an audio codec, this week is an audio/video container. (See my training site for nuances there, if needed!) And it’s a container...
Read more

Researching file formats 24: Unified Speech and Audio Coding

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Unified Speech and Audio Coding. Comments welcome directly to the Library of Congress. Okay, this format was hard! First, the format is standardized via ISO/IEC, which means it’s expensive. Next, the specification is extremely long and technical, with lots of math. And this is my area of...
Read more

Researching file formats 23: Audio Definition Model

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Audio Definition Model. Comments welcome directly to the Library of Congress. This format is about audio, but it’s a text-based document that describes audio. You can say it defines an audio model. It is typically stored as XML, but JSON is a valid option, too. While the...
Read more

Researching file formats 22: Sibelius

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Sibelius Music Notation Format. Comments welcome directly to the Library of Congress. We are entering the Audio-Video set of formats! The next four posts will be a/v formats. I have had some very brief, non-hands-on experience with this format, because one of my college roommates was a...
Read more

Researching file formats 21: bzip

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: bzip2. Comments welcome directly to the Library of Congress. Last week was gzip, this week is bzip. Or, I think, bzip2. I struggled with what to say about gzip, but bzip/bzip2 is more interesting because of PATENT PROBLEMS! bzip2 is based off of its predecessor bzip. bzip2...
Read more

Researching file formats 20: gzip

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: GZIP. Comments welcome directly to the Library of Congress. Am I running out of steam or is there not that much to say about gzip? I think the biggest struggle in writing about the sustainability of this format is ensuring there isn’t conflation between the format itself...
Read more

Researching file formats 19: Java class file

This blog post is part of a series on file formats research. See this introduction post for more information. Update: The official format definition is now online here: Java Virtual Machine Class File Format. Comments welcome directly to the Library of Congress. Java configuration class file format. Might be candidate for least appealing documentation/specification (legacy, here.) The hardest part of this format was having to explain the JVM in a way that makes sense for...
Read more