Common Regular Expressions

Below are regular expressions useful for common validation scenarios.  No claim is made that they are complete or perfect (please comment if you see a flaw and I will update).  I also can’t claim authorship of most of these – they were liberally lifted from elsewhere

URL

^((((https?|ftps?|gopher|telnet|nntp)://)|(mailto:|news:))(%[0-9A-Fa-f]{2}|[-()_.!~*’;/?:@&=+$,A-Za-z0-9])+)([).!’;/?:,][[:blank:]])?$

IPv4 IP

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

IPv6 IP

(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])

Email Address

^[\w\-\+\&\*]+(?:\.[\w\-\_\+\&\*]+)*@(?:[\w-]+\.)+[a-zA-Z]{2,7}$

Credit Card

All Major Cards

bare ^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$
Grouped ^(?:4\d{3}[ -]*\d{4}[ -]*\d{4}[ -]*\d(?:\d{3})?|5[1-5]\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}|6(?:011|5[0-9]{2})[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}|3[47]\d{2}[ -]*\d{6}[ -]*\d{5}|3(?:0[0-5]|[68][0-9])\d[ -]*\d{6}[ -]*\d{4}|(?:2131|1800)[ -]*\d{6}[ -]*\d{5}|35\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4})$
Spaces/dashes ^[ -]*(?:4[ -]*(?:\d[ -]*){11}(?:(?:\d[ -]*){3})?\d|5[ -]*[1-5](?:[ -]*[0-9]){14}|6[ -]*(?:0[ -]*1[ -]*1|5[ -]*\d[ -]*\d)(?:[ -]*[0-9]){12}|3[ -]*[47](?:[ -]*[0-9]){13}|3[ -]*(?:0[ -]*[0-5]|[68][ -]*[0-9])(?:[ -]*[0-9]){11}|(?:2[ -]*1[ -]*3[ -]*1|1[ -]*8[ -]*0[ -]*0|3[ -]*5(?:[ -]*[0-9]){3})(?:[ -]*[0-9]){11})[ -]*$

All Major Cards, Named

bare ^(?:

(?<visa>4[0-9]{12}(?:[0-9]{3})?) |

(?<mastercard>5[1-5][0-9]{14}) |

(?<discover>6(?:011|5[0-9][0-9])[0-9]{12}) |

(?<amex>3[47][0-9]{13}) |

(?<diners>3(?:0[0-5]|[68][0-9])[0-9]{11}) |

(?<jcb>(?:2131|1800|35\d{3})\d{11})

)$

Grouped ^(?:

(?<visa>4\d{3}[ -]*\d{4}[ -]*\d{4}[ -]*\d(?:\d{3})?) |

(?<mastercard>5[1-5]\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}) |

(?<discover>6(?:011|5[0-9]{2})[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}) |

(?<amex>3[47]\d{2}[ -]*\d{6}[ -]*\d{5}) |

(?<diners>3(?:0[0-5]|[68][0-9])\d[ -]*\d{6}[ -]*\d{4}) |

(?<jcb>(?:2131|1800)[ -]*\d{6}[ -]*\d{5}|35\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4})

)$

Spaces/dashes ^[ -]*(?:

(?<visa>4[ -]*(?:\d[ -]*){11}(?:(?:\d[ -]*){3})?\d) |

(?<mastercard>5[ -]*[1-5](?:[ -]*[0-9]){14}) |

(?<discover>6[ -]*(?:0[ -]*1[ -]*1|5[ -]*\d[ -]*\d)(?:[ -]*[0-9]){12}) |

(?<amex>3[ -]*[47](?:[ -]*[0-9]){13}) |

(?<diners>3[ -]*(?:0[ -]*[0-5]|[68][ -]*[0-9])(?:[ -]*[0-9]){11}) |

(?<jcb>(?:2[ -]*1[ -]*3[ -]*1|1[ -]*8[ -]*0[ -]*0|3[ -]*5(?:[ -]*[0-9]){3})(?:[ -]*[0-9]){11})

)[ -]*$

American Express

bare ^3[47][0-9]{13}$
Grouped ^3[47]\d{2}[ -]*\d{6}[ -]*\d{5}$
Spaces/dashes ^[ -]*3[ -]*[47][ -]*(?:\d[ -]*){13}$

Diner’s Club

bare ^3(?:0[0-5]|[68][0-9])[0-9]{11}$
Grouped ^3(?:0[0-5]|[68][0-9])\d[ -]*\d{6}[ -]*\d{4}$
Spaces/dashes ^[ -]*3[ -]*(?:0[ -]*[0-5]|[68][ -]*[0-9])[ -]*(?:\d[ -]*){11}$

Discover

bare ^6(?:011|5[0-9]{2})[0-9]{12}$
Grouped ^6(?:011|5[0-9]{2})[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}$
Spaces/dashes ^[ -]*6[ -]*(?:0[ -]*1[ -]*1|5[ -]*\d[ -]*\d)[ -]*(?:\d[ -]*){12}$

JCB

bare ^(?:2131|1800|35\d{3})\d{11}$
Grouped ^(?:(?:2131|1800)[ -]*\d{6}[ -]*\d{5}|35\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4})$
Spaces/dashes ^[ -]*(?:2[ -]*1[ -]*3[ -]*1|1[ -]*8[ -]*0[ -]*0|3[ -]*5(?:[ -]*\d){3})[ -]*(?:\d[ -]*){11}$

Master Card

bare ^5[1-5][0-9]{14}$
Grouped ^5[1-5]\d{2}[ -]*\d{4}[ -]*\d{4}[ -]*\d{4}$
Spaces/dashes ^[ -]*5[ -]*[1-5][ -]*(?:[0-9][ -]*){14}$

Visa

bare ^4[0-9]{12}(?:[0-9]{3})?$
Grouped ^4\d{3}[ -]*\d{4}[ -]*\d{4}[ -]*\d(?:\d{3})?$
Spaces/dashes ^[ -]*4[ -]*(?:\d[ -]*){12}(?:(?:\d[ -]*){3})?$

The following Regular Expression can remove all non-numeric characters from a credit card, to put it in the “bare” format above: [^0-9]+

National ID

Austrian SSN \b\d{4}(?:0[1-9]|[12]\d|3[01])(?:0[1-9]|1[0-5])\d{2}\b
Bulgarian UCN \b\d{2}(?:[024][1-9]|[135][0-2])(?:0[1-9]|[12]\d|3[01])[-+]?\d{4}\b
Canadian SIN \b[1-9]\d{2}[- ]?\d{3}[- ]?\d{3}\b
Chinese CNICN \b\d{6}(?:19|20)\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01])\d{4}\b
Croatian MCN \b(?:0[1-9]|[12]\d|3[01])(?:0[1-9]|1[0-2])(?:9\d{2}|0[01]\d)\d{6}\b
Danish CRN \b(?:0[1-9]|[12]\d|3[01])(?:0[1-9]|1[0-2])\d{2}[-+]?\d{4}\b
Finish SSN \b(?:0[1-9]|[12]\d|3[01])(?:0[1-9]|1[0-2])\d{2}[-+a]\d{3}[a-z0-9]\b
Indian PAN \b[a-z]{3}[abcfghjlpt][a-z]\d{4}[a-z]\b
Italian FC \b(?:[bcdfghj-np-tv-z][a-z]{2}){2}\d{2}[a-ehlmprst](?:[04][1-9]|[1256]\d|[37][01])(?:\d[a-z]{3}|z\d{3})[a-z]\b
Norwegian SSN \b(?:0[1-9]|[12]\d|3[01])(?:[04][1-9]|[15][0-2])\d{7}\b
Romanian PNC \b[1-8]\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01])(?:0[1-9]|[1-4]\d|5[0-2]|99)\d{4}\b
South Korean RRN \b\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01])-[0-49]\d{6}\b
Swedish PIN \b(?:19|20)?\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01])[-+]?\d{4}\b
Taiwanese NICN \b[a-z][12]\d{8}\b
UK NIN \b[abceghj-prstw-z][abceghj-nprstw-z] ?\d{2} ?\d{2} ?\d{2} ?[a-dfm]?\b
US SSN \b(?!000)(?!666)(?:[0-6]\d{2}|7(?:[0-356]\d|7[012]))[- ](?!00)\d{2}[- ](?!0000)\d{4}\b

US Phone Number

\(?\b[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}\b

The above regex matches 2223334444, 222 333 4444, 222-333-4444, 222.333.4444, (222) 333 4444 and all permutations thereof.  For better granularity break the area code, prefix, and suffix into three different form fields so that only numbers need to be validated, not the format they were delivered in.

US Zip Code

\b[0-9]{5}(?:-[0-9]{4})?\b

VAT Number

Austria ^(AT)?U[0-9]{8}$
Belgium ^(BE)?0?[0-9]{9}$
Bulgaria ^(BG)?[0-9]{9,10}$
Cyprus ^(CY)?[0-9]{8}L$
Czech Republic ^(CZ)?[0-9]{8,10}$
Denmark ^(DK)?[0-9]{8}$
Estonia ^(EE)?[0-9]{9}$
Finland ^(FI)?[0-9]{8}$
France ^(FR)?[0-9A-Z]{2}[0-9]{9}$
Germany ^(DE)?[0-9]{9}$
Greece ^(EL|GR)?[0-9]{9}$
Hungary ^(HU)?[0-9]{8}$
Ireland ^(IE)?[0-9]S[0-9]{5}L$
Italy ^(IT)?[0-9]{11}$
Latvia ^(LV)?[0-9]{11}$
Lithuania ^(LT)?([0-9]{9}|[0-9]{12})$
Luxembourg ^(LU)?[0-9]{8}$
Malta ^(MT)?[0-9]{8}$
Netherlands ^(NL)?[0-9]{9}B[0-9]{2}$
Poland ^(PL)?[0-9]{10}$
Portugal ^(PT)?[0-9]{9}$
Romania ^(RO)?[0-9]{2,10}$
Slovakia ^(SK)?[0-9]{10}$
Slovenia ^(SI)?[0-9]{8}$
Spain ^(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]$
Sweden ^(SE)?[0-9]{12}$
UK ^(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})$

Date

It is easiest to validate the date if the day, month, and year are broken up into three separate explicit fields.  Not only does that simplify the regex used for validation, it removes ambiguity as to whether the day or month is first.

d/m/yy and dd/mm/yyyy \b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\b
dd/mm/yyyy (0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)[0-9]{2}
m/d/yy and mm/dd/yyyy \b(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}\b
mm/dd/yyyy (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}
yy-m-d or yyyy-mm-dd \b(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])\b
yyyy-mm-dd (19|20)[0-9]{2}[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])

XSD Validation

XML Schema Definition (XSD) validation is a useful mechanism to validate data in the XML parser, before it reaches application logic, for example analyzing web service input at a web services gateway.  There are drawbacks to validating data via XSD, foremost that it makes analyzing specific errors in the data difficult, however the easy maintainability of an XSD and the separation from application logic generally make up for the shortcomings.

XSD data validation is typically handled by the xs:restriction XSD element, though it is limited to xs:simpleType and xs:simpleContent declarations.  Below is an introduction to using the various restrictions of xs:restriction:

xs:minExclusive and xs:maxExclusive

minExclusive and maxExclusive define exclusive lower and upper bounds for an element, analogous to the > and < mathematical inequalities.  Setting minExclusive to 0 would mandate that only values greater than 0 are accepted, while a maxExclusive of 100 would mandate that only values less than 100 are accepted.

Example:

<xsd:simpleType name=”Age”><xsd:restriction base=”xsd:integer”>

<xsd:minExclusive value=”0″ />

<xsd:maxExclusive value=”120″ />

</xsd:restriction>

</xsd:simpleType>

Age is an integer that must be greater than 0 and less than 120

Restrictions on Use:

  • xs:minExclusive may not be used simultaneously with xs:minInclusive.  Similarly xs:maxExclusive may not be used simultaneously with xs:maxInclusive.
  • xs:minExclusive and xs:maxExclusive must be valid with respect to the base type specified in the parent xsd:restriction.  For example, if the base type is an integer, specifying a decimal value for either minExclusive or maxExclusive is invalid.
  • xs:minExclusive must be a smaller value than any complementary xs:maxExclusive or xs:maxInclusive elements.  Similarly xs:maxExclusive must be a larger value than any complementary xs:minExclusive or xs:minInclusive elements.

xs:minInclusive and xs:maxInclusive

minInclusive and maxInclusive define inclusive lower and upper bounds for an element, analogous to the >= and <= mathematical inequalities.  Setting minInclusive to 0 would mandate that only values greater than or equal to 0 are accepted, while a maxInclusive of 100 would mandate that only values less than or equal to 100 are accepted.

Example:

<xsd:simpleType name=”Age”><xsd:restriction base=”xsd:integer”>

<xsd:minInclusive value=”0″ />

<xsd:maxInclusive value=”120″ />

</xsd:restriction>

</xsd:simpleType>

Age is an integer that must be greater than or equal to 0 and less than or equal to 120

Restrictions on Use:

  • xs: minInclusive may not be used simultaneously with xs: minExclusive.  Similarly xs: maxInclusive may not be used simultaneously with xs: maxExclusive.
  • xs: minInclusive and xs: maxInclusive must be valid with respect to the base type specified in the parent xsd:restriction.  For example, if the base type is an integer, specifying a decimal value for either minInclusive or maxInclusive is  invalid.
  • xs: minInclusive must be a smaller value than any complementary xs:maxExclusive or xs:maxInclusive elements.  Similarly xs: maxInclusive must be a larger value than any complementary xs:minExclusive or xs:minInclusive elements.

xs:totalDigits

totalDigits restricts the maximum number of digits allowed for a numeric data element however fewer digits may be provided and pass validation.

Example:

<xsd:simpleType name=”AreaCode”><xsd:restriction base=”xsd:integer”>

<xsd:totalDigits value=”3″ />

</xsd:restriction>

</xsd:simpleType>

AreaCode is an integer restricted to a maximum of 3 digits, as US area codes are no more than 3 digits long

xs:fractionDigits

fractionDigits indicates the maximum number of digits in the fractional part (the part to the right of the decimal) of a numeric data element derived from decimal base type.

Example:

<xsd:simpleType name=”Price”><xsd:restriction base=”xsd:decimal”>

<xsd:fractionDigits value=”3″ />

</xsd:restriction>

</xsd:simpleType>

Price is a decimal restricted to a maximum of 2 digits after the decimal place, as the US currency does not offer a denomination less than a cent

xs:length

length specifies the exact length of an element, either in number of characters for string data or number of octets for binary data (type hexBinary or base64Binary).  Provided data may not be shorter or longer; it may only be exactly the specified length to pass validation.

Example:

<xsd:simpleType name=”StateCode”><xsd:restriction base=”xsd:string”>

<xsd:length value=”2″ />

</xsd:restriction>

</xsd:simpleType>

StateCode is a string corresponding to the two character US state abbreviations (Ak,CA,NY and so forth).

Restrictions on Use:

  • xs: length may not be used simultaneously with xs: minLength or xs:maxLength.
  • xs:length is a non-negative integer

xs:minLength and xs:maxLength

minLength and maxLength define inclusive lower and upper lengths for an element, analogous to the >= and <= mathematical inequalities.  minLength and maxLength correspond to either the number of characters desired for string data,  number of octets desired for binary data (type hexBinary or base64Binary), or number of items desired for List Items.

Example:

<xsd:simpleType name=”password”><xsd:restriction base=”xsd:string”>

<xsd:minLength value=”8″ />

<xsd:maxLength value=”256″ />

</xsd:restriction>

</xsd:simpleType>

Password is a string corresponding to the ADP Password Length Policy requiring passwords to be at least 8 characters long and no more than 256 characters long.

Restrictions on Use:

  • xs: minLength and xs:maxLength may not be used simultaneously with xs: length.
  • xs:minlength and xs:maxLength are non-negative integers, and minLength is a value less than or equal to maxLength if both are used

xs:enumeration

enumeration defines an enumeration of values for a data element.  Any values not within the enumeration cause validation to fail.  This is useful for validating data elements for a fixed and small set of accepted values.

Example:

<xsd:simpleType name=”StateCode”><xsd:restriction base=”xsd:string”>

<xs:enumeration value=”AL”/>

<xs:enumeration value=”AK”/>

.

.

.

<xs:enumeration value=”WV”/>

<xs:enumeration value=”WI”/>

<xs:enumeration value=”WY”/>

</xsd:restriction>

</xsd:simpleType>

StateCode is a string corresponding to the two character US state abbreviations (Ak,CA,NY and so forth).  Since there are a limited number of states and territories defined by the US postal system an enumeration works well to validate them

xs:whitespace

Rather than acting specifically as a validation directive, whitespace specifies how the XML validator should normalize whitespace content prior to handing the data off to the application.  Xs:whitespace may be one of the following three values:

  • preserve: No normalization takes place, the value remains unchanged.
  • replace: Each occurrence of the ASCII characters #x9 (tab), #xA (line feed) and #xD (carriage return) is substituted by #x20 (space).
  • collapse: The same processing as replace, but additionally each sequence of one or more consecutive #x20 (space) characters is converted to a single #x20

xs:pattern

Pattern validates a string data element using a specified regular expression.  Multiple xs:pattern elements can be included within a given xs:restriction element; if done the validator will accept any data that passes is matched by at least one of the included patterns.  A list of common regular expressions can be found here.

Example:

<xsd:simpleType name=”password”><xsd:restriction base=”xsd:string”>

<xsd:pattern value=” \A(?=\S*?[A-Z])(?=\S*?[a-z])(?=\S*?[0-9])\S{8,256}\z”/>

</xsd:restriction>

</xsd:simpleType>

Password is a string corresponding to a specific Password policy.

Where did all of the content go?

After several years of self hosting my own blog I have finally moved it to wordpress.com.  I will eventually export the old content and make it available here, but given my current schedule that isn’t high on my list of priorities.  I will be trying to post here a bit more regularly, but again, chronic lack of time in my schedule.

There are some definite benefits to using wordpress.com over the self hosted instance – fairly painless spam filtering, automatic updating (though the current hosted instances do a great job of making that painless), and generally making all the nuts and bolts someone else’s problem.  Of course there are some drawbacks as well – much less control over the install, fewer themes, ads (unless I pay to get rid of them), and so forth.  Anyway, laziness wins for the moment.