Going Deeper Into Email Address Validation

We all recognize an email address when we see one. They are comprised of a unique identifier, the @ symbol and a domain name (i.e. example.com). But is it really that simple? Can you tell which of the following email addresses are valid and which ones are not?

user.name+tag+sorting@example.com
admin@mailserver1
” “@example.org
“john..doe”@example.org
mailhost!username@example.org

According to RFC-822, the first RFC for email address formats, all of the above email addresses are in a valid format.

See Wikipedia – Email Address for a detailed explanation of what’s valid and the complexities of email address formats and examples of valid and invalid email addresses.

But does that mean they are actual valid email addresses? The fact of the matter is, even if you correctly define what a proerly formatted email address is and validate an email address against it, you still do not know if that email address can receive emails and is even in the possession of the user who claims to use it. Depending on your business rules, having a properly formatted email address may be enough. Or you may need to confirm the user can receive emails at the provided email address for the business relationship to progress.

Below we will explore what it means to fully validate an email address.

Email Validation 101: Format

As we will cover below, validating an email address is not as simple as ensuring it is in the proper format. But it is the obvious place to start. However, it’s one thing to recognize a valid email address, it’s another to actually write code to validate an email address. Why is it so difficult? What constitutes a valid email address format is not as simple user at domain dot com. In fact, as the examples at the beginning of this article demonstrate, valid email addresses can take many forms and trying to write a (simple) validation method is difficult. So difficult, in fact, that there is not a completely accurate validation code available in any language.

RFC-3696 provides specific advice for validating Internet identifiers, including email addresses.

Regular Expressions Don’t Work

Perhaps the most common form of email validation found in software today is by checking the format of the email address by using a regular expression. Although often times this is the best option available to most developers (see below), regular expressions are a less than ideal solution. It seems every developer has their own idea as to what a regular expression for validating an email address should look like. As of this writing, RegExLib.com has 931 unique regexes for validating an email address. Some have lax requirements for what constitutes a valid email address, others are more strict. None of them actually satisfy RFC-822 or RFC-3696.

One particular pitfall for using a regular expression is explicitly validating against a list of valid Top Level Domains (TLDs). Back when there were only .coms, .net. org, and country specific TLDs, this was feasible if not unwieldy. However, many new TLDs have sprung into existence and new ones can appear at any time. Any regular expression that explicitly validates against a list of valid TLDs becomes obsolete with the introduction of a new TLD. This creates technical debt that either requires regular maintenance or, if allowed to go without an updated list of TLDs, production defects and/or a degraded user experience.

Built-in Language Support

Unfortunately most programming languages do not provide native functionality for validating an email address. Developers are tasked with inventing their own validation scheme. This can result in inconsistent and/or over-complicated/over-simplified implementations.

PHP offers the built-in filter_var() function, which when used with the FILTER_VALIDATE_EMAIL flag, will validate an e-mail address against the syntax in RFC-822 with the exceptions that comments and whitespace folding and dotless domain names are not supported. This implementation is simple and straight forward and will satisfy most business use cases.

Taking Validation A Bit Further: MX Record Validation

To receive emails, a domain will set up an MX (Mail eXchange) record. An MX-record is a DNS-entry that tells the sending server where to deliver the email. Without an MX record, an email address associated with that domain will be unable to receive emails making an email address useless.

When validating an email address, you can check to see if an email address’s domain has MX records set up. If not, it probably is not valid for your business use case. One thing to keep in mind is MX records are not needed to send emails.

Weeding Out Unwanted Email Addresses

Often times the purpose of capturing an email address is to establish a relationship with the user (i.e. a membership, newsletter subscription, etc.). Users who provide email addresses that are short-lived offer no value as they will not receive any newsletters sent, any membership correspondences, nor be able to use their email address to perform any account actions (i.e. password resets). The only way to determine if an email address is using a short-lived email address is to maintain a list of providers and validate against it.

Free Email Providers

It is common for users to create an email address at a free email provider like Gmail or Yahoo mail whenever they need to provide an email address but do not want to use their primary email account due to not wanting to receive spam or to receive a benefit without establishing a relationship with the benefit provider. The usage of the new free email account is immediately available. These email accounts require no infrastructure or software investments by the user and can often times be accessed via a common web browser.

Disposable Email Providers

Disposable email providers, also known as temporary email providers, allow a user to have an email address that exists for anywhere from ten minutes to a few hours. No account needs to be created and the email address is usually available for use immediately. This email addresses typically only allow the reading of emails but this is usually enough to allow responses to calls to action (i.e. follow links or instructions within the email) and establish the relation needed to receive the benefit they seek.

Unlike signing up for a free email provider, establishing a relationship a disposable email provider is usually frictionless and without any lasting relationship. Since users with disposable email addresses offer no value when attempting to establish a relationship with those users, not allowing users to use domains associated with disposable email providers may help reduce the acquisition rate of low quality users.

Email Address Ownership

When you combine all of the validation methods outlined above, you can be reasonably sure that the email address is a functional email address and likely not fake or temporary. But those checks still lack one critical element of email validation: ownership of that email address.

In most business use cases, when capturing an email address it is important that the email address provided belong to that user. It typically be used to authenticate and communicate with them so having an email address that the user cannot use themselves is just as useless as any email address that fails the above validations. These email addresses could be a random made up email address that passes the above checks (user@gmail.com would be considered valid) or the email address of someone who is not associated with the user.

The best way to verify ownership of an email address is to require the user to provide us with information only that email account would know. If you send an email with a unique token to that email address, and then ask the user to provide you with that token value in your web application, you can be sure they have access to that email account. The workflow for that would be the following:

Create a temporary token. It could be a simple four to six digit number, a 32 character string of random characters, or something in between. Keep in mind some users may need to key that value into a browser so simpler is more user friendly.
Store that value in some form of persistent storage.
Send that token to the email address you wish to validate.
- Ideally the link will be clickable for maximum usability.
- Also provide a plain text version of the token for users who are not viewing their emails as HTML.
When the user arrives at your site via the link in their email, or they have successfully provided the token manually, they are validated.

The key here is to withhold or limit any offerings or services until the token has been provided and the email address validated.

Conclusion

Is that email address valid? That all depends on what your business rules consider “valid”. Verifying the basic format is a good start, but if you want to ensure the email address you are validating actually exists, or even is in the control of the user providing it, you will need to extend your validation far beyond formatting.

John Conde

Going Deeper Into Email Address Validation

Email Validation 101: Format

Regular Expressions Don’t Work

Built-in Language Support

Taking Validation A Bit Further: MX Record Validation

Weeding Out Unwanted Email Addresses

Free Email Providers

Disposable Email Providers

Email Address Ownership

Conclusion

Leave a Reply

Email Validation 101: Format

Regular Expressions Don’t Work

Built-in Language Support

Taking Validation A Bit Further: MX Record Validation

Weeding Out Unwanted Email Addresses

Free Email Providers

Disposable Email Providers

Email Address Ownership

Conclusion

Share this:

Leave a Reply