Youtube-dl Metadata Server

METAdata#

I’ve grown to use youtube-dl a lot as a useful tool for archiving old footage from youtube as well as saving content for offline viewing. It really complements home server setups using Plex and Jellyfin, with one small caveat - metadata.

Scraping metadata from sites after downloading can be difficult, especially as the descriptions may change and/or videos get removed. Luckily youtube-dl comes with a handy flag --write-info-json that saves the metadata at the time of download. This file format cannot be read by Jellyfin/Plex however as they use the Kodi .nfo xml styling. As such I decided to create a small script to do the conversion for me.

ytdl-nfo#

youtube-dl JSON metadata to kodi-style NFO converter

Github Link: https://github.com/owdevel/ytdl-nfo

Design Considerations#

The tool needed to be designed in a way that it can process a series of *.json files created using youtube-dl and convert it into .nfo media compatible with Jellyfin/Plex. When creating the tool I wanted it both to be useful as a library as well as a cli tool.

youtube-dl creates different metadata tags based on the extractor (website) it uses. Given this it was important that the extraction system incorporated modular configs in order to account for different variables produced by different websites

XML Nodes and Yaml#

The .nfo xml style luckily uses one root node, which meant that it was possible to generate the containing information easily off that node recursively using python’s xml module. The tricky part was then deciding how to structure the yaml file, and parse the values as it is read.

Python to the rescue! Using the inbuilt str.format(**data) allows for variables to be automatically parsed into the relevant strings! This combined with date formatting for the strings allowed for very simple code generated from a simple yaml document.

# /configs/youtube.yaml
episodedetails:
  - title: '{title}'
  - showtitle: '{uploader}'
  - uniqueid:
      attr:
        type: 'youtube'
        default: 'true'
      value: '{id}'
  - plot: '{description}'
  - premiered:
      convert: 'date'
      input_f: '%Y%m%d'
      output_f: '%Y-%m-%d'
      value: '{upload_date}'

Why not string replace a xml document?

It would have been simpler to create a series of .nfo templates and just use a string replace to substitue values in, however I wanted to add some basic conversion functionality in as well, for example the date format. Eventually I would like to add tag support as well which would mean needing to iterate over a list to generate a certain xml node multiple times. This could’ve been done in a similar fashion to Vuejs’s v-for directive however that approach would’ve added more complexity in dealing with nested data. Using yaml for configuration is cleaner in my opinion and also allows for more extensibility in the future given the full control over every node creation.

Future Plans#

This project is still in its early stages but it is functional and has helped me organise my own personal server. It still requires decent error handling and documentation as well as many more extractors for all the possible sites. Given it was mostly a learning exercise for me, I’ll continue to expand it as I need to but probably won’t continue working on it much unless people request it.