MediaInfoInfo: Metadata tags

This is the eight and (perhaps) penultimate in a series of blog posts on a project I call “MediaInfoInfo”. (And the last until this project is able to be wrapped up after code review, which hasn’t started yet so it could be a while.) Anyway, I suppose this means that this is the last of the weekly post series. This project is focused on improving documentation for the ubiquitous media analysis tool MediaInfo.

  1. MediaInfoInfo: Initialize project
  2. MediaInfoInfo: Parameters / making a plan
  3. MediaInfoInfo: General streams
  4. MediaInfoInfo: Video streams
  5. MediaInfoInfo: Audio streams
  6. MediaInfoInfo: Image, Menu, Text and Other streams
  7. MediaInfoInfo: MediaInfo-specific words
  8. MediaInfoInfo: Metadata tags

Tags

Something I started looking into but determined I’d come back to later are the tags sections. Mostly, I wanted to know if it was worth it to get granular enough into defining where the tags are coming from, and if they are part of a certain standard or were derived from a certain specification. Fundamentally, I think that many do, but some are a mix of whichever are available, so I determined that they could not each have their own definitions citing their “provenance.” Plus I don’t know how these will continue to be defined in the future, with new formats or further analysis into each of the tags. But I learned where each of the parsings take place, and how they are mapped from the embedded metadata codes into something presented to the end user. Below is an example of a few of those. That’s not to say that metadata doesn’t get collected, parsed, and written all over the place, but sometimes it was in little bundles that are easy to read if you’re willing to get your hands dirty and dig into the codebase a little bit.

This isn’t always a direct mapping, because the variable definitions end up getting mapping again to this long list of default language fields. You’ll have to make the connection between the original piece of code, the variable it gets mapped to, and this list for a perfect mapping, but just replacing the underscores in “General_DistributedBy” to be “General” and “Distributed by” should be roughly enough. You would only need that second part if you need it to be perfect-perfect like for a code-parsing thing, and even still there’s probably a better way to do that.

Here’s a roundup of some of the tags and where in the code they get mapped to the words that MediaInfo displays, and what term is used when that happens (or if it doesn’t happen – some get passed over).

ID3v2 tags

Many metadata roads lead to ID3 tag version 2.x. This is not the latest version, but it’s the easest to read. This format was intended for MP3s but it has extended into and influenced other tagging systems that hold metadata embedded inside of other kinds of files in different ways.

MediaInfo does some interesting things with the ID3v2 mappings, and you can see how things work here.

Quicktime tags

Like ID3v2 moving beyond MP3, Quicktime tags seem to extend beyond just its original Quicktime File Format (QTFF) scope. Here is where embedded information stored in the moov atom, like from iTunes metadata, is identified and extracted from the 4CC data.

Broadcast Wave Format metadata

Here is where MediaInfo takes all the header data that makes BWF files become extra-special WAV files and transforms them into human-readable parameters.

Matroska tags

Here is where Matroska tags are mapped into MediaInfo elements. This is not an exclusive list of possible Matroska tags, but the ones deemed worth pulling out and presenting.

Vorbis tags

Here are a bunch of tags that fit into the Vorbis Comments metadata container. This also applies to Theora, Opus, FLAC, and some other file formats. Vorbis tags seems a bit tricky because they can be any FieldName=Data format, so it just has to be common enough to be added here, like specific pieces of software that allow for the adding of this metadata (Hello, MusicBrainz).

Flash Video tags

Here are a few sparse places where metadata tags are grabbed from the source and mapped into output fields. Flash isn’t an open specification so this could be useful to people that have to work with recovering these little files.

Other resources

Outside of MediaInfo, the Matroska folks have made a metadata mapping spreadsheet to compare Matroska fields with other common tags. This could be helpful too!

Something that might also be useful for thinking about embedded metadata is the W3C Ontology for Media Resources 1.0.

Okay, that’s all for now as I put together a final pull request and wait for review.