Posted 24-Dec-2008 in Software Development

All Music Guide (AMG) Tagging Script

So recently I decided that I hated how my music collection was tagged. I try to break apart the genres but ideally most of my music falls under multiple genres. Well All Music Guide (AMG, allmusic.com) does a pretty good job of classifying everything. They use broad genres, and then further define music based on the styles it falls into. Most of us consider those styles to be genres of their own, but ideally you can't really have more than one genre, and if you use iTunes (like I do) you know that comma separating the genres is a mess (each unique genre string is treated as such). My solution is to use the broad genre as the actual genre, and then plug the styles into the comment tag. This worked out great, except that it was a lot of leg work to look up those artists and update the tags manually. Unfortunately I did that with my entire library before writing this script. The script I wrote is in Python and looks up the artist / album on the AMG website, then loads in the genre and styles, and updates the files accordingly (currently only supports MP3 files). Just to warn you, this script is completely proof of concept and may be considered a malicious spider by AMG as it does mine their site for data. "Why not use one of those other services out there to update my library," you ask? Well it's because I don't really consider "this is shit" to be a valid tag for any of my music. AMG is for the most part the best repository of such data (albeit it's mostly mainstream music) and it's not biased the way other community driven databases are. Enjoy allmusictagger.py and if you end up expanding upon it, drop me a line and let me know.

#!/usr/bin/env python
# Updates the genre and comment tags of MP3 files with data from
# allmusic.com. Specifically, the genre is populated with the fairly broad
# genre listing and the comment tag is populated with the more specific
# styles for a particular album.
#
# Furthermore, I don't have permission from allmusic.com to do this, so if
# they come after you for using it, please respect their wishes.
#
# Be warned that this script is fairly buggy and is more of a proof of
# concept than anything else.  Also, it was developed on Ubuntu Linux, no
# efforts were made to get it working on other platforms.
#
# @author Josh Sherman
# @link   http://joshtronic.com
from mutagen.id3 import ID3, TCON, COMM
from numpy import *
import os
import pickle
import re
import sys
import urllib
# Recursively processes the files
def process(path):
global _cache
for file in os.listdir(path):
file = path + '/' + file
if os.path.isdir(file) == True:
process(file)
else:
# Checks the file extension
# TODO Add support for m4a and wma
if file[-3:] == 'mp3':
audio = ID3(file)
# Pulls the artist and album from existing ID3 tags
artist = audio['TPE1'][0].lower()
album  = audio['TALB'][0].lower()
# Note: This is a hack for NIN albums prefixed with 'Halo # - '
pattern = re.compile('^halo [0-9]{2} - ')
if pattern.match(album):
album = pattern.sub('', album)
genre = styles = ''
# Tries to load the data from cache, else does a full look up
try:
if _cache[artist][album] != None:
genre  = _cache[artist][album]['genre']
styles = _cache[artist][album]['styles']
except:
# Performs an album search
params  = urllib.urlencode({'P': 'amg', 'sql': album, 'opt1': 2, 'samples': 1})
webpage = urllib.urlopen("http://www.allmusic.com/cg/amg.dll", params)
html	= webpage.read()
# Breaks apart the results and tries to find our artist
if html.find('') != -1:
# Isolates the table of results
html = html.lower()
html = html.split('')
html = html[1].split('')
html = html[0].split('', html[i])
# Checks the artist, and then the album
if result[3].replace('&', '&') == artist:
result_album = result[5].split('"')
if result_album[2][1:-4] == album:
# Opens the album's URL
webpage = urllib.urlopen('http://www.allmusic.com' + result_album[1].replace('&', '&'))
html = webpage.read()
if html.find('') != -1:
# Extracts the Genre & Styles block of HTML
html = html.split('')
html = html[1].split('')
# Extracts the genre name
genre = html[0].split('');
genre = genre[0].split('>');
genre = genre[-1]
# Extracts all of the styles
temp = list()
styles = html[1].split('');
for style in styles:
style = style.split('>')
style = style[-1]
if style.strip() != '':
temp.append(style)
# Converts the list to a string (comma delimited)
styles = temp
temp   = ''
for style in styles:
if temp != '':
temp += ', '
temp += style.replace('/ ', '/')
styles = temp
temp   = ''
# Adds the data to the cache
_cache[artist] = {}
_cache[artist][album] = {}
_cache[artist][album]['genre']  = genre
_cache[artist][album]['styles'] = styles
break
# Caches the artist / album as None if no results were obtained
if genre == '' and styles == '':
_cache[artist] = {}
_cache[artist][album] = None
else:
exit('Error: The search yielded unexpected results.')
try:
# Updates the genre and comments with the extracted values
if genre != '' and styles != '':
audio.add(TCON(encoding=3, text=unicode(genre)))
audio.add(COMM(encoding=3, text=unicode(styles), lang='eng', desc=''))
audio.save()
else:
print 'Error: The file (' + file + ') was not able to be updated, no entry was found on allmusic.com'
except:
print 'Error: The file (' + file + ') was not able to be updated.'
if name == 'main':
global _cache
# Check if a path is passed in and is valid
try:
path = sys.argv[1]
except:
exit('Usage: ./allmusic-tagger.py /path/to/music')
if os.path.exists(path) == False or os.path.isdir(path) == False:
exit('Error: The specified path (' + path + ') is not a valid directory.')
else:
cache_filename = '/tmp/allmusic-tagger_cache.pkl'
# Loads the cache
if os.path.exists(cache_filename) and os.path.isfile(cache_filename):
cache_file = open(cache_filename, 'rb')
_cache	 = pickle.load(cache_file)
cache_file.close()
else:
_cache = {}
# Strips trailing slash
if path[-1] == '/':
path = path[0:-1]
# Processes our files
process(path)
# Stores the cache
cache_file = open(cache_filename, 'wb')
pickle.dump(_cache, cache_file)
cache_file.close()

:wq

Like this drivel? There's a whole RSS feed of it, or subscribe via email.