Matt Camilli's Blog It's something

Django's URLField and Underscores

Written on September 18, 2014

A week ago The Engine Room was faced with a problem. A customer was trying to add a blog to our system with a subdomain that had an underscore in it. While a frowned upon convention(as domains cannot have underscores) it is still a valid url. The problem arose because Django's default URLField does not allow any underscores, and threw ValidationErrors upon saves. After finding the Django Issue addressing this bug, we found out that it is intentional and that the project will not be fixing it as URLFields abide by official rules (RFC 1034/1035).

While this is all well and dandy we still needed the ability to add blog urls with underscores in their subdomains. The solution was to implement our own URLField, which would be identical to Django's with the exception of a more fine tuned validator. After perusing the Django github repo we simply copied their basic URLValidator and changed the regex to make it allow underscores in only subdomains.

from django.core.exceptions import ValidationError
from django.core.validators import RegexValidator
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import force_text
from django.utils.six.moves.urllib.parse import urlsplit, urlunsplit
import re

class BetterURLValidator(RegexValidator):
    This validator allows underscores within the subdomains of URLS
    regex = re.compile(
        r'^(?:http|ftp)s?://'  # http:// or https://
        r'(?:(?:(?:[A-Z0-9](?:[A-Z0-9-_]{0,61}[A-Z0-9])?\.)?'  # Subdomain that
        allow underscores
        r'(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}(?<!-)\.))|' # domain
        r'localhost|'  # localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|'  # ...or ipv4
        r'\[?[A-F0-9]*:[A-F0-9:]+\]?)'  # ...or ipv6
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)
    message = _('Enter a valid URL')

    def __call__(self, value):
            super(BetterURLValidator, self).__call__(value)
        except ValidationError as e:
            # Trivial case failed. Try for possible IDN domain
            if value:
                value = force_text(value)
                scheme, netloc, path, query, fragment = urlsplit(value)
                    netloc = netloc.encode('idna').decode('ascii')  # IDN -> ACE
                except UnicodeError:  # invalid domain part
                    raise e
                url = urlunsplit((scheme, netloc, path, query, fragment))
                super(BetterURLValidator, self).__call__(url)
            url = value
from django.db import models
from .validators import BetterURLValidator

class BetterURLField(models.URLField):
    This field allows underscores in the subdomains of URLS
    default_validators = [BetterURLValidator()]

    def __init__(self, verbose_name=None, name=None, **kwargs):
        super(BetterURLField, self).__init__(verbose_name, name, **kwargs)

Note The only use case where this doesn't quite work is with domains.

Tags: django urlfield validator regex
comments powered by Disqus

Software Engineer, Gamer, Cyclist, Bearded Lover

Developer Tools 1
DevOps 2
Django 1
Python 1