Allmusic ID3 Tag Fixer

Introduction:

One of the major problems with using Allmusic.com as a metadata source is the difficulty in searching for the correct ablbum. Ideally we want to be searching for both Artist and Album together. However Allmusic.com only allows for searching of either album or artist not both. So the best way I have determined to match the album "Greatest Hits" is to first search for the artist and then search the discography for that album.

Part 2: Downloading Artist ID from Allmusic

Allmusic.com uses individual sql IDs to identify each artist and album. So what we need is a script to search and parse out the correct sql ID and add it to our mySQL database. I decided to use perl over php because this script will be run from a command prompt and not a browser because of the lenght of time necessary to complete it’s task.

?

It is pretty simple to use libwww to download an html page. In order to get the most accurate results we post our data to Allmusic.com. Then we need to parse out the correct information from the downloaded page. The main problem I ran into was how to make sure that we had accurately picked the right artist. I discovered a very simple perl module named string::compare. As I have said it is very basic and returns a value from 0 to 1, 1 being an identical string. After we parse the info we just need to add it to our mySQL database.

Now there is a tricky part with Allmusic.com. In order to prevent commercial websites from overwhelming the Allmusic.com?website they impose a ban if you request more pages in a given period of time then what they are comfortable with. In order to prevent this from happening I had to add a delay to the script which causes it to take nearly three times as long. I ran through a list of about 700 artists in about an hour.

The script:

#!/usr/bin/perl
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
#use our little string comparision module
use String::Compare;
#These are the same mySQL functions again
sub Create_DB_Connection{
?? use DBI;
?? $DSN? = "DBI:mysql:database=DATABASE;host=HOST";
?? $user = "USERNAME";
?? $pw?? = "PASSWORD";
?? $dbh? = DBI->connect($DSN,$user,$pw)
???? || die "Cannot connect: $DBI::errstr\n" unless $dbh;
?? return;
?} # End of Create_DB_Connection subroutine.
? sub Do_SQL{
??? eval{
????? $sth = $dbh->prepare($SQL);
??? }; # End of eval
??? # Check for errors.
??? if($@){
????? $dbh->disconnect;
????? print "Content-type: text/html\n\n";
????? print "An ERROR occurred! $@\n";
????? exit;
? } else {
??? $sth->execute;
? } # End of if..else
? return ($sth);
} # End of Do_SQL subroutine
?sub filter{
?? $_[0]=~s/\'/\'\'/g;
?? $_[0]=~s/\\/\\\\/g;
?? return $_[0];
?} # End of filter subroutine
&Create_DB_Connection;
#We want distinct artists that we have no comparision
#value for or its comparision value is less than 65%
#This enables us to run the script multiple times
#without requesting an artist twice that we already
#know is correct.

$SQL = "SELECT DISTINCT artist FROM albums WHERE ";
$SQL = $SQL . "artistcomp < '.65' OR artistcomp IS NULL";
&Do_SQL;
$i=0;
while ($pointer = $sth->fetchrow_hashref){
? $sqlArtist = $pointer->{'artist'};
? push(@db, $sqlArtist);
}

foreach $sqlArtist (@db){
$i++;
###Search URL TO Use
$url = "http://www.allmusic.com/cg/amg.dll";

##hehe The script runs a little faster than all
##music would like it looks like the have a max
##of 50 searches in a given period of time
##to prevent you from being banned we will sleep
##for 2 minutes for every 25 search requests.

if ($i == 25){
sleep 120;
$i = 0;
}
##Get Web Page
$ua = new LWP::UserAgent;
$req = POST $url, [ P => 'amg', OPT1 => '1', SQL => $sqlArtist];
$total = $ua->request($req)->as_string;
##Split out all tabs and end of lines
$total =~ tr/\n\r\t / /s;
$locate = index($total,'width:190px;word-wrap:break-word;',0)+ 33;
if ($locate < 33){
? ##We are not on a list page or there was an error
? ##NEED TO ADD CHECK TO SEE IF WE HAVE BEEN TAKEN TO THE ARTIST PAGE

? $locate = index($total,'Discography',0) - 40;
? if ($locate < 0){
??? ##Wow I really have no idea were we are must be an error page
##or the artist does not exist, lets skip it and try another

??? next;
? }
? $locate = index($total,'sql=',$locate) + 4;
? $end = index($total, '~', $locate);
? $where = $end - $locate;
? $artistid = substr($total,($locate),$where);
?
? $locate = index($total,'class="title">',0) + 14;
? $end = index($total, '<', $locate);
? $where = $end - $locate;
? $artist = substr($total,($locate),$where);
}
else {
? ##We are on the list page.
? $locate = index($total,'sql=',$locate) + 4;
? $end = index($total, '"', $locate);
? $where = $end - $locate;
? $artistid = substr($total,($locate),$where);
?
? $locate = index($total,'>',$locate) + 1;
? $end = index($total, '<', $locate);
? $where = $end - $locate;
? $artist = substr($total,($locate),$where);
}
#Escape characters that would cause SQL errors
$artistcomp = compare($artist, $sqlArtist);
$artist = filter($artist);
$artistid = filter($artistid);
$sqlArtist = filter($sqlArtist);

$SQL = "UPDATE albums SET artistid = '$artistid', newartist = '$artist',";
$SQL = $SQL . " artistcomp = '$artistcomp' WHERE artist = '$sqlArtist'";
&Do_SQL;
print $i . "\n";
}
$dbh->disconnect;

Notes:

I would leave the sleep function alone. If you alter it too much you will get temporarily or possibly permanetly.

As I have stated before this is a very rudimentary script. I have only designed it to run from the command line. I imagine that you could make this much more user friendly or design it into a website. If you did this I would recommend only 20-25 artists at a time.

Next:

After we have identified all of the artists we need to make sure that we have identified them correctly. The easiest way to do this is partially using the comparision value. I have found that anything above 65% has only 1 error in about 200. So next I will show you a simple PHP script to look at and correct any errors that were missed or created in the parsing process.

This entry was posted in Programming. Bookmark the permalink.

8 Responses to Allmusic ID3 Tag Fixer

  1. Brian says:

    Hi Kevin,

    I’ve been googling for a util to use Allmusic to update my MP3s with genre, styles, themes and moods and stumbled across you tagger…. sadly I’m not a coder and mainly tinker with http://www.meedio.com formy media management. Did you / Or do you plan to convert these scripts into a “dumb” util for us poor schmuck’s 😉 if it’s not your plan no problems but thought I would reach out 😀

  2. Kevin Keegan says:

    As you can see the program is written in perl. This is mostly because it it the language that I know best and also because it is very programmer friendly.

    Unfortunatly it is not very user friendly. I am working on creating a version with a Graphical Interface that can be distributed without perl. However the interface will not look as smooth as a normal windows app.

    My script is really only a hack to. It cannot fix problems if you have a bad name or album title. And it also misses about 10% of the files because it can’t match an album perfectly.

    I will keep you in mind though, I still have to fine tune the underlying code. Then I will try and make a distributable version with a GUI.

  3. john says:

    ditto brian’s comment

  4. adam says:

    I’m glad to hear that. I just stumbled upon this site, and I’ll be watching it.

    Right now I use tag & Rename, manually grabbing genre/mood/reviews/ etc… from allmusic. This would make things so much easier!

  5. Gerrit says:

    Hi,

    If you create an account at allmusic and login a cookie will be set. That cookie can be used to do unlimited requests to allmusic.

    some code(was written a while ago):

    my $ua = LWP::UserAgent->new();
    my $cookie_jar = HTTP::Cookies::Netscape->new(
    ‘file’ => ‘cookies.txt’
    );
    $ua->cookie_jar($cookie_jar);
    $ua->agent(“Mozilla/4.76 [en] (Windows NT 5.0; U)”);

  6. Chrisname15 says:

    I couldn’t phrase what Brian said any better. I’ve been looking to give my huge music collection some real depth and allmusic seems to be the only place that has any real quality data in terms of genre, mood, bios etc. Although, since allmusic is primarily a commercial based subscription service it would take some real time to copy and paste everything I’m looking for into each mp3. I think what you have developed is exactly what I’m looking for, although since I have almost no background in pearl/programming this code looks like %^&*&(* to me  Love to see this developed into something user friendly. THANKS

  7. Buddy says:

    Kevin,

    I have been wanting a time machine for years now, not because I want to see the building of the pyramids or buy shares in google when it was an idea but so that I could use your AllMusic script to clean up my music genre’s. I read that this script stopped in 2007, are the constraints the same now do you know? Is this something you are still interested in or were you lucky enough to finish your tagging before you had to stop it?

    Can you please offer me some advice as to how I can gleen their genre/mood/style’s to automatically append my music collection?

    Thanks,

    Buddy

    • krkeegan says:

      Oh sorry, this was something I finished a long time ago. It was always a hack, I had to do a fair bit of cleaning by hand, but it at least got me over the hurdle. Today there are applications that cost on the order of $20 that do the same thing. Given the amount of time I had to put in to get this to work, I would recommend that option.

Leave a Reply

Your email address will not be published. Required fields are marked *