Programming Tools
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
 
Go Back   Dev Articles Community ForumsProgrammingProgramming Tools

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Dev Articles Community Forums Sponsor:
  #1  
Old August 1st, 2003, 01:09 AM
benos benos is offline
Contributing User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Posts: 233 benos User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Article Discussion: Create a IP-Country Database Using PERL and MySQL

If you have any questions or comments on this article then please post them here.

This forum post relates to this article

Reply With Quote
  #2  
Old August 8th, 2003, 09:07 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
IP::Country

Of course you could build a system yourself, but why would you?

Seems an awful lot of trouble when you could have used IP::Country from CPAN.

use IP::Country;
my $locator = IP::Country->new();

# prints 'US'
print $locator->inet_atocc('slashdot.org');

# prints 'US'
print $locator->inet_atocc('66.35.250.150');

The database is updated every month, is at least an order of magnitude faster than mysql, and is produced using the full ripe splits rather than the stats (which have huge holes in them). Plus the module installs a command-line utility 'ip2cc' which is fairly handy:

[nwetters@torvalds nwetters]$ ip2cc slashdot.org

IP::Country modules (v2.15)
Copyright (c) 2002,2003 Nigel Wetters
Database updated Mon Aug 4 14:10:07 2003

Name: slashdot.org
Address: 66.35.250.150
Country: US (United States)

[nwetters@torvalds nwetters]$

Reply With Quote
  #3  
Old August 8th, 2003, 09:10 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
converting IP to a number

faster method:

use Socket qw( inet_aton );
unpack('N',inet_aton($ip));

Reply With Quote
  #4  
Old August 8th, 2003, 09:22 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
finding the registry

use IP::Authority;

Reply With Quote
  #5  
Old August 8th, 2003, 09:34 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
downloading stats

You're downloading the registry stats 3-4 times a day!? Don't you think that might be a waste of their bandwidth?

The stats aren't even particularly accurate. If you're building a geolocation tool, use the ripe split. There's instructions in the documentation for IP::Country on how to rebuild the database from the ripe split.

You're not going to produce a measurable increase in accuracy by updating your database every day, let alone four times a day.

Talk to RIPE. They'll offer advice on how to use the registry data properly.

And if you produce any worthwhile code, have you considered releasing through CPAN or discussing on a mainstream site (perlmonks, use.perl.org, etc.). Great suffering has been caused in the past by hacked-together, semi-working code being presented as ready-made solutions for Perl novices (e.g. Matt's script archive).

Reply With Quote
  #6  
Old August 8th, 2003, 09:55 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Naive SQL

When you have 50,000 ranges, the following SQL is acceptable:

mysql> select code FROM ip_map ip
-> WHERE (IPfrom <= 1359937540 ) AND
-> (IPto >= 1359937540 );

However, when you start using the proper data, from the ripe split, you'll find that this is the wrong way to search through a miilion IP ranges.

Reply With Quote
  #7  
Old August 10th, 2003, 04:09 AM
mikkom mikkom is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 4 mikkom User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Re: IP::Country

Quote:
Originally posted by nwetters
[B]
The database is updated every month, is at least an order of magnitude faster than mysql, and is produced using the full ripe splits rather than the stats (which have huge holes in them). Plus the module installs a command-line utility 'ip2cc' which is fairly handy:

Just a simple question, isn't it against the copyrights to use the split database for geographical mapping purposes? This is explicitly said in apnic faqs

URL
Quote:
Can I use the database to map IP addresses to countries?

No (see Can I download the full contents of the database?). However, APNIC does produce another data source (APNIC allocation and assignment report) that allows for convenient geographic mapping.


Other RIRs contain the EXACTLY same copyright as the apnic.:

Quote:
# Restricted rights.
#
# Except for agreed Internet operational purposes, no part of this
# publication may be reproduced, stored in a retrieval system, or
# transmitted, in any form or by any means, electronic, mechanical,
# recording, or otherwise, without prior permission of the RIPE NCC
# on behalf of the copyright holders. Any use of this material to
# target advertising or similar activities is explicitly forbidden
# and may be prosecuted. The RIPE NCC requests to be notified of
# any such activities or suspicions thereof.

I know lots of things are possible, it all comes down to legalibility.

edit: Also I'm not quite sure how the statistics file offered by RIR that contains ripped DB information could contain different information than DB, could you please provide some example ip that maps to different country in stat file compared to raw DB?

Last edited by mikkom : August 10th, 2003 at 04:23 AM.

Reply With Quote
  #8  
Old August 10th, 2003, 06:25 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
copyright details agreed with RIRs

All of the copyright statements have been agreed with the regional internet registries. In correspondance with the RIRs, they have been extremely helpful, and their main concern was to protect people from abuse of country data (e.g. from spam, advertising).

http://search.cpan.org/author/NWETT...ry.pm#COPYRIGHT

If you are going to seriously promote your method of finding country from IP address, you should talk to the RIRs, as I have done.

Finding differences between databases is a simple task of looking up random IP numbers and comparing the two systems. I have done this with my system, TJ Mather's and the Quova geolocation system, and have written several articles on the accuracy of geolocation systems at various granularities.

Google is your friend.

[edit: snipped the copyright stuff]

Last edited by nwetters : August 11th, 2003 at 05:32 AM.

Reply With Quote
  #9  
Old August 10th, 2003, 06:35 AM
mikkom mikkom is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 4 mikkom User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Re: copyright details agreed with RIRs

Quote:
Originally posted by nwetters
However, it looks like your work was originally inspired by TJ Mather's GPL Licensed code (Maxmin)


I think you have mistaken me for another person, I'm not the writer of the article. I just thought it might be a good idea to bring in the point of legalibility.

The reason why I'm interested in this is that I'm developing a site statistics service (URL) and I'm currently trying to decide if I should use some existing commercial product (the best of them are VERY expensive, cheap ones are based on only whois records and it seems that they are not accurate enogh nor complying with the copyright conditions), or start developing my own algorithm.

Currently my implemented _very simple_ algorithm uses plain stat lists from RIRs but I would much rather use full whois because of their accuracy at city/region level. My algorithm has nothing to do with anyone elses, in fact it's not quite difficult to create such an algorithm (that parses string tokens from text file) but you propably know this quite well.

This said, my opinion is that you can't get accurate geomapping based on plain RIR whois information, you have to implement in addtion a traceroute engine that has a large list of geomapping regexps, example of such engine with small list of regexps can be found here: URL This is basically method at least quova and infosplit are using. Both have added their own additional mechanisms but it's still traceroute with regexps.

What RIRs have you talked with about geomapping/ips? Have you talked to APNICs people?

I would be very interested if you would like to give some direct urls to your articles, by google I won't propably find them all anyway.

edit: Hmm.. It seems that your program uses stat files from everywhere else but from ripe? arin dump you are referring to does not exist.
edit2: traceroute part

cheers,
Mikko Mattila CEO URL

Last edited by mikkom : August 14th, 2003 at 03:07 AM.

Reply With Quote
  #10  
Old August 11th, 2003, 05:04 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
reply

arin dump is hidden - if you GET via FTP it'll work.

here's a script I use monthly:

#!/usr/local/bin/perl -w
use strict;

use Net::FTP;

my $now = time();
my $yesterday = $now - (60 * 60 * 24);

my $dl =
{
'ftp.arin.net' => [ { dir => '/other/dump',
name => 'aup_dump.txt.gz',
gzip => 1 },

{ dir => '/pub/stats/arin',
name => undef,
gzip => 0 } ],

'ftp.ripe.net' => [ { dir => '/ripe/dbase/split',
name => 'ripe.db.inetnum.gz',
gzip => 1 },

{ dir => '/ripe/stats',
name => undef,
gzip => 0 } ],

'ftp.lacnic.net' => [ { dir => '/pub/stats/lacnic',
name => undef,
gzip => 0 } ],

'ftp.apnic.net' => [ { dir => '/public/apnic/stats/apnic',
name => undef,
gzip => 0 } ],

};

foreach my $site (keys %$dl){

print "connecting to $site\n";
my $connection = Net::FTP->new($site,(Timeout => 15));
unless ($connection){
warn("couldn't connect to ".$site);
next;
}

unless ($connection->login()){
warn("couldn't login to ".$site);
$connection->quit();
next;
}

foreach my $remote_file (@{$dl->{$site}}){

unless ($connection->cwd($remote_file->{dir})){
warn ("couldn't change directory to ".$site.$remote_file->{dir});
next;
}

unless ($remote_file->{name}){
unless ($remote_file->{name} = latest_file($connection->ls())){
warn("couldn't get directory listing of ".$site.$remote_file->{dir});
next;
}
}

my $local_filename = $remote_file->{name};
if ($remote_file->{gzip}){
$local_filename =~ s/\.gz$//
unless (-e $local_filename);
unless ($connection->binary()){
warn ("could set type to binary on ".$site);
next;
}
} else {
unless ($connection->ascii()){
warn("could set type to ascii on ".$site);
next;
}
}

if (-e $local_filename){
my $local_mdtm = (stat($local_filename))[9];
my $remote_mdtm = $connection->mdtm($remote_file->{name}) || $now;
print "Remote MDTM is ".localtime($remote_mdtm)."\n";
print "Local MDTM is ".localtime($local_mdtm)."\n";
if ($local_mdtm > $remote_mdtm){
my $local_size = -s $local_filename;
my $remote_size = $connection->size($remote_file->{name}) || 0;
print "Remote SIZE is $remote_size bytes\n";
print "Local SIZE is $local_size bytes\n";
if (($local_size == $remote_size) ||
($local_filename ne $remote_file->{name})){
print "skipping\n\n";
next;
}
}
print "deleting partial or old download\n";
unlink $local_filename;
}


print "downloading ".$remote_file->{dir}.'/'.$remote_file->{name}."\n";
unless ($connection->get($remote_file->{name})){
warn("couldn't get ".$remote_file->{name}." from ".$site);
next;
}


if ($remote_file->{gzip}){
print "unzipping\n";
my @args = ("gunzip", $remote_file->{name});
system(@args) == 0
or warn "gunzip ".$remote_file->{name}." failed: $?";
}

}
print "disconnecting\n\n";
$connection->quit();
}

sub latest_file
{
my @dir = @_;
my $latest_file = "";
my $latest_year = 0;
my $latest_month = 0;
my $latest_day = 0;
foreach my $file (@dir){
my ($day,$month,$year) = (0,0,0);
if (($file =~ /(\d{4})(\d{2})(\d{2})$/) ||
($file =~ /(\d{4})-(\d{2})-(\d{2})$/)){
($day,$month,$year) = ($3,$2,$1);
} else {
next;
}
if(($year > $latest_year) ||
(($year == $latest_year) && ($month > $latest_month)) ||
(($year == $latest_year) && ($month == $latest_month) && ($day > $latest_day))){
$latest_year = $year;
$latest_month = $month;
$latest_day = $day;
$latest_file = $file;
}
}
return $latest_file;
}

Reply With Quote
  #11  
Old August 11th, 2003, 05:13 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
link

google:

http://www.google.com/search?q=wett...ocator&filter=0

If you are interested in geolocation, perhaps the best way of finding out more is to join Joshua's list and introduce yourself:

http://lists.burri.to/mailman/listinfo/geowanking

Last edited by nwetters : August 11th, 2003 at 05:21 AM.

Reply With Quote
  #12  
Old August 11th, 2003, 05:26 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
availon

It looks like your site is using a java environment. Have you looked at http://javainetlocator.sf.net/ ?

Reply With Quote
  #13  
Old August 11th, 2003, 05:30 AM
nwetters nwetters is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 10 nwetters User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
terminology

when I used RIR, I meant the regional internet registries:

RIPE - europe and north africa
ARIN - north america and sub saharan africa
APNIC - asia pacific
LACNIC - latin america / caribbean

All produce stats and full database dumps, but only RIPE publish a useable full inetnum dump (known as the 'RIPE split').

Reply With Quote
  #14  
Old August 11th, 2003, 06:56 AM
mikkom mikkom is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 4 mikkom User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Re: terminology

Quote:
Originally posted by nwetters
when I used RIR, I meant the regional internet registries:


So did I.

Quote:
All produce stats and full database dumps, but only RIPE publish a useable full inetnum dump (known as the 'RIPE split').


So does at least apnic, try the following path:
URL

If you can tell me paths to full database dumps of arin/lacnic (I've browsed all the ftp directories), I would be very glad. I seriously doubt that you can get them without signing bulk data agreement

URL
(lacnic has similar agreement with penalties if data is spread)

I didn't find the file you were referring to:
Code:
ProFTPD 1.2.7 Server (ftp.arin.net) [ftp1]
Logging in...
Anonymous access granted, restrictions apply.
Logged in to ftp.arin.net.
ncftp / > ls
archives/   erx/        inaddr/     junk/       netinfo/    other/      pub/        rir-coord/  templates/
domain@     home/       info/       ls-ltR      netprog/    policy/     rfc@        template@   usr/
ncftp / > get other/dump/aup_dump.txt.gz
get other/dump/aup_dump.txt.gz: server said: other/dump/aup_dump.txt.gz: No such file or directory
ncftp / > cd other
ncftp /other > ls
ncftp /other > cd ..
ncftp / > cd other/dump
Could not chdir to other/dump: server said: dump: No such file or directory
ncftp /other >

It seems that there is no such directory or file. I don't know what you mean by "hidden", there is no such thing in unix file system except for the .-paths. Your script did not fetch it either, just stat files.

Are you sure you have really downloaded that file in last months, I know that it used to exist some months ago.

Quote:
It looks like your site is using a java environment. Have you looked at URL ?


I have evaluated it (as I have many commercial products) but it was not sufficient for my purposes. I'm interested in more detailed information, ie. city, region and isp.

btw. my script is quite a bit shorter if you would like to test it, it's plain shell script (it's all in one row, the forum formats it to several rows)

Quote:
wget URL URL URL --output-document=country_ips


I'm currently not very interested in joining any geolocation lists, I know quite well how the location can be done (by combination of whois information, traceroutes and nslookups), all I'm missing is the full RIR whois dumps.

Cheers,
Mikko Mattila CEO URL

Last edited by mikkom : August 14th, 2003 at 03:06 AM.

Reply With Quote
  #15  
Old September 18th, 2003, 07:42 AM
mikkom mikkom is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Posts: 4 mikkom User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Availon

Just a little demo link if someone is interested, the Availon uses our own geolocation engine. The demo can be viewed at:

URL

See especially the visitors/geo and visitors/by organization sections.

I'm still working on a more accurate location but now I'm aware of one more method that I'm going to use, the location will be very precise and will require a very little CPU/networking power.

cheers,
mikkom