MediaInfoInfo: Initialize Project!

This is the first of what will be a series of blog posts (probably) on a project I call “MediaInfoInfo”. This project is focused on improving documentation for the ubiquitous media analysis tool MediaInfo.

Other posts:

  1. MediaInfoInfo: Initialize project
  2. MediaInfoInfo: Parameters / making a plan
  3. MediaInfoInfo: General streams
  4. MediaInfoInfo: Video streams
  5. MediaInfoInfo: Audio streams
  6. MediaInfoInfo: Image, Menu, Text and Other streams
  7. MediaInfoInfo: MediaInfo-specific words
  8. MediaInfoInfo: Metadata Tags

My initial idea was to have a slow-moving drive-by hackday-esque event where folks could contribute details, research, and feedback to the changes I would be proposing within MediaInfo, and I can collect and concatenate and parse through these extra details to come up with the best definition of each of the parameters. I wrestled with this for a long time. GitHub is still, for many, an intimidating space. Google Docs would become extremely sluggish with the necessary size of a document like this. Google Sheets is okay but it’s not as easy to have robust commenting or version control. (Yes, both are in Google Sheets, but they are somewhat hidden features, relative to how I am used to working.) I didn’t think too much about the “too many cooks in the kitchen” problem, but as I thought through this project, I did think about how most of the feedback I am likely to personally receive is going to come from the audiovisual preservation/conservation sector. I don’t want to bias the definitions to that relatively small sector when MediaInfo is used widely in technology in general, and thus should have generic technical definitions, and I wanted to keep that perspective.

While speaking to someone else about this project, I surprised myself by the realization that I am one of the few people that 1) have this knowledge of how digital video works at this level and 2) are willing to share it. It also helps that I am part of the official MediaArea team and have worked on many other MediaArea-created audiovisual analysis projects. Also, I am weird in that I enjoy doing tedious work. I find it “relaxing.”

Why am I doing this? I think MediaInfo provides a great gateway for people to better understand how video works. A lot of folks want the documentation to be improved, but no one has offered to actually pay for it. Personally, I am privileged in that I get to work on several open source projects for a living, but I also want to be able to “give back” by donating my time and building up the documentation, pro bono, and I hope it influences deeper media analysis work in the future. (If you do appreciate this work I am doing, I recommend donating to MediaArea in acknowledgement of the years and years of pro bono technical support that has been given! Even a tiny donation is very appreciated as a way to acknowledge there is massive value to all the work that has been done for free and made available to everybody over the 16 years MediaInfo has been around!)

I got the go-ahead on my idea to update the MediaInfo documentation from the lead (and historically solo) maintainer Jérôme at No Time To Wait 4, a conference we organize. That wasn’t strictly necessary, but it’s a good idea to get your idea out there to the people who will be responsible for accepting pull requests before starting to work on it, especially since the review and merging of code is itself a burden that is not compensated. Fortunately, Jérôme thought this was a good idea and would be happy to take the pull request and perform the necessary steps to get it into the codebase eventually. And I hope that this task will help cut down on some of the non-technical “technical debt” him and I both receive regularly, when we need to explain what certain fields mean.

This project will start with updating the output from mediainfo --Info-Parameters. This gives the user all of the definitions used to describe files and it’s 1644 lines long. That is a lot of lines! The definitions that power this command are located in this part of the codebase. I already did some of this work in Summer 2019, but I feel like it could use a deeper consideration and consistent phrasing across all of the parameters. This output is underutilized, I think. I didn’t know about it until I went digging around for it. And after doing this documentation project, it’d be pretty easy to map the output to something machine-readable, and translate that into something structured and suitable for rendering as HTML. I’d like to do that next, or in conjunction if/when I get bored. I think these definitions are beneficial even outside of the context of this open source library.

Jérôme suggested that I create one PR with a [WIP] flag on it, as a place to ask questions about things I’m unsure of, and generally continue our shared ethos of wanting to work in the open, where others can contribute feedback. I also realized that I could still turn this into an educational opportunity by writing about my deep-dive investigations into some of the media parameters, and share that experience (which I hope will be informative), which I will do on this blog. A large part of working in public means being wrong in public, but fortunately that is something I’ve mostly grown accustomed to.

Here is the new WIP PR with a little more information (and, in the future, shared knowledge): https://github.com/MediaArea/MediaInfoLib/pull/1207

For the in-between state, I am also working in the open. I mapped each parameters group to a spreadsheet, and these are available here, if people are curious about the original and proposed definitions and questions I am asking myself during this process, as reminders for myself to review or follow up on some of the details. However, I will ask all questions and diffs will be available on the main Pull Request. To keep conversations centralized, these spreadsheets are view-only.

Thanks for reading! Future blog posts will be more technical in detail, going into how I’m deriving the information to make these conclusions and update the parameters fields appropriately, and in a way that makes the most sense for a wide and diverse audience. If I am good, I will manage to update at least once a week.