Nitrooos

Developer's thoughts

Geolocalization using GeoLite2

nitrooos

You may know or not, but the knowledge about where the visitors of your webpage are using it, allow you to better adjust the content or even make the right decisions in the business context. So how can we actually implement such magic in our webpage? If you’re interested in the answer, this article is for you! We’ll be adding geolicalization to the sample page using a GeoLite2 library.

TLDR; GeoLite2 is a free, solid database, containing the mappings of IP addresses to cities and countries. There exists many ready to use APIs for that database, among others for C#, C, Java, JavaScript (node.js), Perl, PHP, Python and Ruby languages.

What is GeoLite2?

GeoLite2 is a database (encoded in the binary format), whic maps IP addresses from around the world to countries and cities. So knowing an IP address of the user (an access to it isn’t problematic in the case of internet websites and applications) we can localize the country and even city from which the page was visited. Of course such database doesn’t (and won’t) have 100% accuracy, so please be aware sometimes it’ll give you a false information. However, my experience says they are rare and the database is very accurate. How exactly accurate? The producer says the paid version (named GeoIP2 ) includes 99.9999% of IP addresses being in use. The accuracy of pointing the country is supposed to be 99.8%. These numbers are impressive, but we don’t have any related to the free version (so GeoLite2). The producer says only that it’s less accurate and being updated not so often like the paid one.

Connecting with the GeoLite2 database

We start with downloading GeoLite2 database, available on the producer’s website (the MaxMind company) in this link. Let’s notice it’s a database in the country version, which means it only maps IP addresses to countries, not cities. It’s a smaller version of the free database, there exists city version (also freely available), but because it maps IPs also to cities, it’s over 10 times larger than the country version.

Then we should install one of the available APIs for GeoLite2 (in our case a Python API):

    $ pip install geoip2

Now we can connect to database, easiest using the script below:

>>> import geoip2.database
>>> reader = geoip2.database.Reader('/path/to/GeoLite2-Country.mmdb')

Please notice the Reader object, connecting to the database, is supposed to be used for many queries. It shouldn’t be created for every single query, because it’s a costly operation. The first, simple things we can do with the database are:

>>> response = reader.country('135.200.12.84')
>>>
>>> response.country.iso_code
'US'
>>> response.country.name
'United States'
>>> response.subdivisions.most_specific.name
'Indiana'
>>> response.subdivisions.most_specific.iso_code
'IN'
>>> reader.close()

We get a concrete information about the country and region the IP comes from (in this case an USA state). If we had connected with the city database, we would receive some additional information:

>>> response = reader.city('135.200.12.84')
>>>
>>> response.city.name
'Indianapolis'
>>> response.postal.code
'46226'
>>> response.location.latitude
39.7722
>>> response.location.longitude
-86.1565
>>> reader.close()

After finishing work with GeoLite2 we can call the .close() method on Reader object (it closes the connection in an explicit way).

An interesting thing is the fact you can use GeoIP2 (so the paid version) also in the form of web service. In such case the API remains almost the same, but you don’t have to store locally the city nor country databases.

Geolocalization in Django

It turns out that (remaining in the Python world), the most popular web framework for this language also supports using GeoIP2 and GeoLite2. The proper package inside Django is called django.contrib.gis.geoip2 and makes it possible to import GeoIP2 class (equivalent of the Reader object). This class is a wrapper around geoip2 library. An example work session with this package looks like (taken from the official Django documentation):

>>> from django.contrib.gis.geoip2 import GeoIP2
>>>
>>> geoip = GeoIP2()
>>> geoip.country('facebook.com')
{'country_code': 'US', 'country_name': 'United States'}
>>> geoip.city('72.14.207.99')
{'city': 'Mountain View',
'continent_code': 'NA',
'continent_name': 'North America',
'country_code': 'US',
'country_name': 'United States',
'dma_code': 807,
'latitude': 37.419200897216797,
'longitude': -122.05740356445312,
'postal_code': '94043',
'region': 'CA',
'time_zone': 'America/Los_Angeles'}
>>> geoip.lat_lon('salon.com')
(39.0437, -77.4875)
>>> geoip.lon_lat('uh.edu')
(-95.4342, 29.834)
>>> geoip.geos('24.124.1.80').wkt
'POINT (-97 38)'

You can see the API is slightly different, using it is however still comfortable. Django’s documentation is here very helpful. But wait, how the framework does know where the GeoIP2 (or GeoLite2) database is located? There are 3 configuration variables, which we can set in the settings.py file of our project:

GEOIP_PATH = os.path.join(BASE_DIR, 'common', 'geoip', 'country_dataset')
GEOIP_COUNTRY = 'Geo_db_country.mmdb'  # defaults to 'GeoLite2-Country.mmdb'
GEOIP_CITY = 'Geo_db_city.mmdb'        # defaults to 'GeoLite2-City.mmdb'

First of them, GEOIP_PATH is required. The rest are optional as long as we stick with the default names for database files for cities and countries.

The idea: intelligent language and localization choosing on the page

In this part of the article we use geolocalization to implement an intelligent mechanism of automatically choosing language and country on the page. It means that user visiting our page from e.g. Madrid will see it in spanish language and Spain will be set as a localization (of course assuming the application supports spanish translations ;). Such behavior can be important from the business point of view, because sometimes when we sell some products/services in the Internet, our offer can differ between countries. How can we implement it in Django?

Example middleware

The good starting point here is implementing function’s decorator (e.g. @set_country_and_language) or custom middleware (it’s a mechanism used heavily in Djangos’ applications, makes possible to process a request before it will be actually handled by the view function). In this case we choose the second option.

Let’s start with creating a new file (often called middleware.py in Django) and defining inside it few imports needed:

from django.utils import translation
from common.constans import DEFAULT_COUNTRY
from common.geoip.utils import detect_country_by_IP
from common.models import CountryModel
from common.utils import get_language_associated_with_country

The django.utils.translation package is responsible for enabling proper language on page after country detection. DEFAULT_COUNTRY is a normal text constant, may be equal to e.g. ‘GB’. It means the default country will be Great Britain, in the case we won’t be able to detect country based on IP. detect_country_by_IP is a helper function making an actual detection. CountryModel is a country model in the application. And the last import, get_language_associated_with_country is a function mapping country on the language assigned to it.

class GeoIPMiddleware:
    """
    Custom geo location middleware which tries to detect user country and
    language in case of country is not already set in session (like when user
    visits site for the first time).

    Location is set based on request's IP address and by using django.contrib.gis.geoip2 package.
    """

The middleware is a class like any other, it doesn’t need to inherit from any base class. We add also some concise description.

Every middleware implements one or more methods called by Django, most often it’s a process_request method with the request as an argument. Framework calls it with every request, before it even decide which view should handle it:

def process_request(self, request):
    if 'country' not in request.session:
        detected_country_code = detect_country_by_IP(
            request.META['REMOTE_ADDR'])
        application_country_code = self._get_application_country(
            detected_country_code)
        associated_language = get_language_associated_with_country(
            application_country_code)

        request.session['country'] = application_country_code
        translation.activate(associated_language)

        return None

Our middleware is basically handling the case when the key country is not set in session. Then we run the country detection (using detect_country_by_IP function, the IP address is taken from request object as request.META[‘REMOTE_ADDR’]). Next we check whether in the database exists the country returned from detection process or not (_get_application_country method, implementation in the next part of the article) - it may be the case detection pointed e.g. Democratic Republic of the Kongo, but our app doesn’t handle this country ;) Then we get the language associated with the country (get_language_associated_with_country), set country in session dict and activate proper language (translations) on the page.

Every middleware in Django should return None or an object of class HttpResponse. In our case process_request method returns None, so Django will process the requests further.

Helper functions implementation

def detect_country_by_IP(ip_address):

    """
    Auto-detects user's country based on IP address
    Implementation uses GeoIP2 package

    :param ip_address: IP address as string to look up
    :return: Country code as string
    """

    try:
        geoip = GeoIP2()
        geoip_guess = geoip.country_code(ip_address)
        if geoip_guess is not None:
            detected_country_code = geoip_guess
        else:
            detected_country_code = DEFAULT_COUNTRY
    except GeoIP2Error:
        detected_country_code = DEFAULT_COUNTRY

    return detected_country_code

First of the used helper functions (detect_country_by_IP) creates the GeoIP2 object and tries to get country code based on IP address. When this step completes, geolocalization was successful! In the other case we use a default country.

def _get_application_country(self, proposed_country_code):
    proposed_country_exists = CountryModel.objects.filter(
        country=proposed_country_code).exists()
    if proposed_country_exists:
        return proposed_country_code

    return DEFAULT_COUNTRY

Next we call _get_application_country method. When application doesn’t handle the country detected by geolocalization, we also choose the default one.

I won’t put an implementation of get_language_associated_with_country function, because it’s specific to the application. Each country can just have assigned to it some default language and the responsibility of this function is to point which language should be used with given country.

Add middleware to the application’s settings

Now we only need to add our middleware to the list of these being used by the application (in settings.py of our project):

MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.locale.LocaleMiddleware',
    ...
    'common.geoip.middleware.GeoIPMiddleware',
]

It’s best to do it at the end of the list, because it will be used after processing by any previous middleware, coming from the framework itself.

Geolocalization - the summary

As I tried to prove, the geolocalization topic isn’t a hard functionality to implement in your webpage. There exists a ready-to-use, free library GeoLite2. Using it, you can add geolocalization to your website in an easy way. Additionally, this library has a full support in Django, which is also a plus. I hope I explained this topic to you a little, thanks for reading and see you in the next post!