Docendo discimus

$self->explain

  • Calendar

    May 2010
    M T W T F S S
    « Apr   Jul »
     12
    3456789
    10111213141516
    17181920212223
    24252627282930
    31  
  • Archives

  • Recent Posts

  • Bling

Vedalken improves

Posted by brunorc on May 2, 2010

Nobody’s perfect.

Vedalken, apart from their strictly structured minds and a tendency towards logic, aren’t perfect either. After some heavy testing and inspecting the database populated by the script, I came up with several bugfixes.

Here’s the new version:

#!/usr/bin/env perl

use strict;
use warnings;

use WWW::Mechanize;
use Data::Dumper;
use DBI;

my $dbh = DBI->connect( "dbi:SQLite:dbname=vedalken.db", "", "" )
    or die "Cannot open or create the DB file: $DBI::errstr";

$dbh->do( q/
    CREATE TABLE IF NOT EXISTS Card (
        id          INTEGER PRIMARY KEY,
        name        VARCHAR(64) UNIQUE NOT NULL,
        manacost    VARCHAR(8) NOT NULL DEFAULT 0,
        color       VARCHAR(10),
        cardtype    VARCHAR(32),
        subtype     VARCHAR(32),
        raw         TEXT,
        amount      INTEGER NOT NULL DEFAULT 0
    )
/ ) or die "Cannot create table: ", $dbh->errstr;

my $sth = $dbh->prepare( q/
    INSERT INTO Card (name, manacost, color, cardtype, subtype, raw) 
    VALUES (?, ?, ?, ?, ?, ?)
/ ) or die "Cannot prepare insert statement: ", $dbh->errstr;

my %cards;

my $url = "http://www.essentialmagic.com/cardsets/default.asp";
my @unwanted_sets = qw/32 23 1 30 53/;

my $mech = WWW::Mechanize->new;

$mech->get($url);

my @sets = $mech->find_all_links( url_regex => qr/Checklist/ );

foreach my $set (@sets) {
    next
        if grep { $set->url =~ /ID=$_$/ } @unwanted_sets;

    print "Processing set pointed by ", $set->url;

    $mech->get( $set->url );

    my @cards = $mech->find_all_links( url_regex => qr{em2/Cards/default.aspx} );

    print " - ", scalar @cards, " cards found\n";

    foreach my $card (@cards) {
        next
            if $card->text eq 'Buy';

        $cards{ $card->text } = { raw => $card->attrs->{onmouseover} };
    }
}

print scalar keys %cards, " unique cards found\n";

my @rarity = qw/Common Uncommon Rare Mythic Unknown/;

my %colors = (
    W => 'White',
    B => 'Black',
    U => 'Blue',
    R => 'Red',
    G => 'Green'
);

foreach my $card ( sort keys %cards ) {
    ( my $raw = join( ' ', split( /\n/, $cards{$card}->{raw} ) ) )
        =~ s/^showCardPopup\((.+)\);$/$1/;

    my @info = split( /, /, $raw );

    splice @info, 0, 2;

    @info = map {
        $_ =~ s/^'//;
        $_ =~ s/'$//;
        $_ =~ s/\\'/'/g;
        $_ } @info;

    while ( scalar @info > 6 and not grep { $info[6] eq $_ } @rarity ) {
        shift @info;
    }

    my $mana = $info[0];
    my $type = $info[1];
    my $subt = $info[2];
    my $color;

    next
        if $type =~ /^Basic/;

    $color = $type eq 'Land'
        ?   'Land'
        :   $mana =~ /^[0-9]+$/
            ?   'Colorless'
            :   'Multicolor';

    if ( $color eq 'Multicolor' ) {
        foreach my $c ( keys %colors ) {
            $color = $colors{$c}
                if $mana =~ /$c/
                    and not grep { $mana =~ /$_/ } grep { $_ ne $c } keys %colors;
        }
    }

    $sth->execute( $card, $mana, $color, $type, $subt, join( ', ', @info ) )
        or die "Cannot execute insert statement: ", $dbh->errstr;
}

$dbh->do( q/UPDATE Card SET cardtype = 'Instant' WHERE cardtype IN ('Interrupt', 'Mana Source')/ );
$dbh->do( q/UPDATE Card SET cardtype = 'Creature' WHERE cardtype = 'Summon'/ );

$dbh->disconnect;

First of all, since the information about card type and subtype is already present, why not to use it? So I added new columns in the table. Also, because of the database the %cards_by_color hash is no longer needed, so it was removed. I added a hash with colors, which is later used to loop over it in lines 106-112 in order to determine the single color. In fact, the whole logic of discovering the color was rewritten – as well as handling the call to showCardPopup:

    ( my $raw = join( ' ', split( /\n/, $cards{$card}->{raw} ) ) )
        =~ s/^showCardPopup\((.+)\);$/$1/;

First, I get rid of multiline descriptions (there were some), by splitting on a newline and joining on a space. The result of this operation is then assigned to $raw. Then I use a regular expression to get rid of JavaScript bits; since the parentheses are used in regular expressions to capture matches, I have to escape the “real” parentheses. Actually, I should have used the /x modifier for regexp, which would definitely improve the readiness of this code:

    ( my $raw = join( ' ', split( /\n/, $cards{$card}->{raw} ) ) )
        =~ s/^showCardPopup \( (.+) \) ;$/$1/x;

And now for determining the colors:

    $color = $type eq 'Land'
        ?   'Land'
        :   $mana =~ /^[0-9]+$/
            ?   'Colorless'
            :   'Multicolor';

    if ( $color eq 'Multicolor' ) {
        foreach my $c ( keys %colors ) {
            $color = $colors{$c}
                if $mana =~ /$c/
                    and not grep { $mana =~ /$_/ } grep { $_ ne $c } keys %colors;
        }
    }

The ternary operator is being chained to allow setting the “colorless” colors. After that, every card is assumed to be multicolor. Then, for every color in the table, this color is set if mana cost matches the letter describing this color, and in the same time mana cost doesn’t match (first grep) any color different from the particular color being checked (second grep).

Since the vocabulary of Magic cards has been changed many times, some cards may get registered in the database under some archaic categories. So I added two lines which fixes those cases, calling SQL UPDATE on the populated database.

Update: Some cards appears more than one time with postfixes from “(1)” to “(4)”. I fixed it by deleting them from the table, because I really didn’t want to scrape Essential Magic over and over again.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: