Exploring the Vasulka PDF Archive

A series of short investigations into meaningful archival access, by Ashley Blewer during Recurse Center's annual Never Graduate Week.

Introduction

The Vasulka PDF Archive is a great, rich collection of files covering their legendary work, and the work of many other video and computer artists from the 20th century.


However... getting a quick understanding of what is inside of the collection, picking out things that seem interesting, or understanding the folder structure is difficult.


I spent a maximum of 5 days exploring some quick ways to provide better and user-friendlier access to the collection (and too much time writing docs and on this webpage itself), and these experiments are the results of that rapid-fire research! I wanted to work on things that were archivist-sized, something that could be done using one's own computer and free online resources, and not getting too heavy with things archivists usually don't have access to like cloud computing technology (which I don't really know much about anyway -- maybe next year). See this blog for technical process notes. I enjoyed working on this, and I hope you enjoy the results, too!

Experiment X - Freetext search

Extracted text and added to relational database. Made a little search engine that links back to the original hosted material.

Demo (N.B. if the Glitch machine is waking up, your first search may take quite a while because it is waking up too; subsequent searches should be quick!)

Next steps would be: fix the slow start, add blurb and highlight where keyword appears in documents

Experiment X - Group and montage

PDF thumbnails were grouped by likeness and montage sheets were generated.

See all ten (resized) montage sheets here.

Next steps would be: associate with filenames, make linkable

Experiment X - Folder structure

How is the collection structured? Something difficult about the collection as it appears above is that there are so many documents and folders, it's hard to understand how they were arranged (and there are nearly 700 folders).

Demo

Next steps would be: (basically refactor completely) improve scale issues and add files

Experiment X - Tree map

Improving upon the previous experiment, used D3 to display folders by size of contents.

Demo

Next steps would be: make linkable

Experiment X - Circle map

Well, what about a circle map instead? Again, used D3 to display folders by size of contents

Demo

Next steps would be: make linkable

Experiment X - Word cloud

No project is complete without wordclouds!

See all word clouds (including my first error-filled one) here.

Next steps would be: add masking to color and shape wordclouds

Experiment X - Text network

Stepping up from wordclouds is generating a map of the text and the relationship between key words. This is the closest I came to accidentally making art with this collection, and my computer spent a long time processing what looked like a thousand tiny spiders.

See miscellanous screenshot outputs and download the files here.

Next steps would be: explore this topic further, improve access to files by converting to JSON and rendering in D3

Experiment X - Pixplot

This tool from the Yale DH Lab made it super-easy to put a touch of machine learning onto the exported thumbnails. It looks like it'd be a pretty powerful tool with an associated CSV full of metadata, which I don't have.

Next steps would be: unsure, deploy it?

Experiment X - Ridgelines

I was so desperate to create a ridgeline graft that I didn't care what it was about. This is the word count for the first nine files of the first nine directories. It doesn't even look cool!

Next steps would be: collect and display useful data (and more of it)

And that's a wrap!