JavaScript Regular Expressions.

In this section we will talk about how to check the correctness of the information entered by the user. By correctness we mean that if user was prompted for a zip code, then we need to check that he/she actually entered a 5-digit number and not some text or if we are expecting an e-mail address, then it should be in the form: name@server.network.

A simple way to validate an entry is to compare it with a given constant, but this is obviously not a case here. We do knot know up front what user enters. Another way is applicable to numbers. If we expect to get a number, then we need to convert the entered string into a number and check the range. This methods is simple, but unfortunately it doesn't always work. For example, user by mistake entered a street address in the field for the area code. That is, we are expecting an 3 digit number. However, if the user lives in "313 Someway St.", then our checking went just fine, because parseInt() method converts 313 and ignores the rest. Besides, this method doesn't fork for, for example, phone numbers. Phone number can be entered as:

  • 123-4567
  • 123 4567
  • 123 45 67
  • 123.45.67
  • and other ways (we don't even mention it may include area code).
    Thus, we would like to have a tool that allows us to check if a string matches a specific pattern.

    Regular expression

    JavaScript has such a tool. RegExp object allows us to do this kind of checking. The are several ways to create an instance of the regular expression object. The following example creates several RegExp objects:
    var reg = new RegExp(" wEb");
    var re = / /;
    var Re = /Web/;
    var RE = /web/gi;
    
    All the regular expressions from the example are designed to do a very simple checking. For example, the very first line creates an object to check is a test string contains " wEb" as a substring. Please notice that we are not talking yet how to use regular expressions. For the time being we are talking only about how to create them.

    A simple regular expression uses no special characters for defining the string to be used in a search it contains only a string you want you want to find in you test string. If we need to assign a pattern (rules) the test line should match we have to use special characters. For example, if we want to find out if my test string contains a phone number in the form ***-**** we need to use special character \d, which indicates that any digit matches this character but nothing but a digit, then my regular expression would be:

    var reg = /\d\d\d-\d\d\d\d/;
    
    This regular expression requires that the test string has three any digits then dash symbol and then four more digits. The following table contains other special characters we can use in JavaScript regular expressions:

    matching metacharacters
    Character Matches Example
    \b Word boundary /\bor/ matches "origami" and "or" but not "normal"
    /or\b/ matches "traitor" and "or" but not "perform"
    /\bor\b/ matches "or" and nothing else
    \B Word nonboundary /\Bor/ matches "normal" but not "origami"
    /or\B/ matches "normal" and "origami" but not "traitor"
    /\Bor|B/ matches "normal" but not "origami" or "traitor"
    \d Numeral 0 through 9 /\d\d\d/ matches "212" and "415" but not "B17" or "ABC"
    \D Nonnumeral /\D\D\D/ matches "ABC" and "GEF" but not "B17" or "123"
    \s Single white space /over\sbite/ matches "over bite" but not "overbite" or "over  bite"
    \S Single nonwhite space /over\Sbite/ matches "over-bite" but not "overbite" or "over bite"
    \w Letter, numeral, or underscore /A\w/ matches "A1" and "AC" but not "A+"
    \W Non letter, numeral, or underscore /A\W/ matches "A+" but not "AC" and "A2"
    . Any character except new line /.../ matches "abC", "12f", "1+ ", or ant three characters
    [...] Character set /[AN]BC/ matches "ABC" and "NBC" but not "BBC"
    [^...] Negated character set /[^AN]BC/ matches "BBC" and "CBC" but not "ABC" or "NBC"
    Counting metacharacters
    Character Matches last character Example
    * Zero or more times /Ja*vaScript/ matches "JvaScript", "JavaScript", and "JaaaavaScript" but not "JuvaScript"
    ? Zero or one time /Ja?vaScript/ matches "JvaScript" or "JavaScript" but not "JaavaScript"
    + One or more times /Ja+vaScript/ matches "JavaScript" or "JaaaavaScript" but not "JvaScript"
    {n} Exactly n times /Ja{2}vaScript/ matches "JaavaScript" but not "JvaScript" or "JaaaavaScript"
    {n,} n or more times /Ja{2,}vaScript/ matches "JaavaScript" or "JaaaavaScript" but not "JvaScript"
    {n, m} At least n at most m times /Ja{2,3}vaScript/ matches "JaavaScript" or "JaaavaScript" but not "JvaScript" or "JaaaaavaScript"
    positional metacharacters
    Character Matches located Example
    ^ At the beginning of the string /^Fread/ matches "Fred is OK" but not "I'm with Fred" or "Is Fred here?"
    $ At the end of the string /Fread$/ matches "I'm with Fred" but not "Fred is OK" or "Is Fred here?"

    For example if you want to make sure that a match for a Roman numeral is found only when it is at the start of a line and has a dot after it you check for the match

       /^[IVXMDC]+\./

    Not to be confused with the metacharacters listed in the table above are escaped string characters for
    Symbol Escape symbol Description
    tab \t Tabulation symbol
    newline \n New line symbol
    return \r carriage return
    formfeed \f Formfeed symbol (printer command)
    vtab \v Vertical tabulation symbol
    . \. Dot
    ^ \^ Caret symbol
    $ \$ Dollar sign
    \ \\ Backslash
    / \/ Slash
    - \- Dash
    ( \( Open parenthesis
    ) \) Close parenthesis

    Properties of RegExp object

    When you create a regular expression (even via the /.../ syntax), JavaScript invokes the new RegExp() constructor. The regular expression returned by the constructor is endowed with several properties obtaining details of its data. Namely, any instance of the RegExp object has the following properties: Example:
    car re = /\bbe\b/g ; 
    
    Variable re created like shown above before any match search was executed has the following properties:
     Object.PropertyName   Value 
    re.source "\bbe\b"
    re.global true
    re.ignoreCase false

    Using Regular Expressions

    There are several ways to execute the search for a pattern match against a string. If we just need to know whether or not there is a substring matching the pattern in a bigger string we can use method test() of the RegExp object. This method takes only one argument the string we need to look for match in and returns a Boolean value that indicates whether or not is a match. Here is a short example:

    If you also need to know where is the substring matching the pattern in a big string you can use search() method of the string object. This method takes only one argument the regular expression containing the pattern and does the search against the string object executing the method. It returns a character offset of the substring matching the pattern if there is a match and -1 otherwise. Here is an example:

    Another way to find a match in a string is to invoke exec() method of the regular expression. For example, if we are interested in finding substring " be " in the string If the return value foundArray is null that means there is no match. The first element of the result array always contains the substring matching the pattern, values of other elements depends on the parenthesis put in the pattern. For more details see the next section.

    The following page contains an example that demonstrates all three methods described above.

    Getting information about a match

    You can not only verify that a one-field date entry is in desired format, but also extract match components of the entry. To get any piece of information inside a substring matching the pattern we need to embrace the corresponding part of the pattern in parenthesis. Please notice that parenthesis themselves are special symbols inside patterns and do not require any match on the string. If a pattern includes one or more parenthesis sets, then substring of the match corresponding the pattern inside parenthesis will be returned by method exec() as additional element of the array.

    For example, if we are checking that a date was entered in either in "mm/dd/yyyy" or "mm-dd-yyyy" format and also need to know the values of the month, day, and year we can use the following regular expression:

     
    var re = /\b(1[0-2]|0?[1-9])[\-\/](0?[1-9]|[12][0-9]|3[01])[\-\/]((19|20)\d{2})/
    
    Let's take a closer look at this expreession: combining these three thing together and adding possibility for different separators we come up with This example illustrates usage of the same method exec() for getting additional information about the matching data.

    String Replacement

    Let's consider a small example about credit card numbers. Credit card number can be entered as
  • 6432-2345-2342-2342
    or
  • 6432 2345 2342 2342
    or
  • 6432234523422342
    our goal is to recognize a valid number and trasform it the the first form. Let's use the following regular expression: To replace a part of a string that matches a regular expression we can use .replace() method of the String object. This method takes two arguments:
  • the regular expression to be replaced
  • and the string to replace with
    Example: We can use the same example to illustrate the differnce between global and nonglobal regular expressions: For more sofisticated usage of the same method see this example. This example is based on using a global RegExp object and its $1, $2, ..., $9 properties. As a regular expression method executes, any parenthesized result is stored in RegExp's nine properties reserved for just that purpose (called backreferences).

    Some other properties of RegExp object

    After a match is found in the course of one of the regular expression methods, the RegExp is informed of some key contextual information about the match. The leftContext property contains the part of the main string to the left of (up to but not including) the matches string. Be aware that the leftContext starts its string from the point at which the most recent search began. Therefore, for second or subsequent times through the same string with the same regular expression, the leftContext substring varies widely from the first time through.

    The rightContext consists of a string starting immediately after the current match and extending to the end of the of the main string. As a subsequent method calls work on the same same string and regular expression, this value abviously shrinks in length untill no more matches are found. At this point, both properties revert to null. The short version of these properties are $` and $' for leftContext and rightContext, respectively.
    Example:

    Data-Entry Validation

    In this section we would like to discuss how JavaScript regular expressions can be used to validate user's input. There are several different ways to do it. We will talk only about one of them. Namely, we will check the correctness of the input only when user clicked on the Submit button. First of all, we need to mention that this button can be either real submit button (that is, created with input type=submit) or just a button. In either case we need to create an onClick handler that actually will check the correctness:
    <form action=do_something.php name=tform>
     Name: <input type=text name=uname><br>
     Phone: <input type=text name=phone><br>
     <input type=submit value=OK onClick="check_input();">
    </form>
    
    The function check_input() we set as an click event handler has to do the following:
    1. check the input in the name field and make sure it's a real name (doesn't start with a digit, doesn't contain underscores, punctuation signs, etc)
    2. if name is incorrect, then this function should print an error message and focus the name field.
    3. if name is correct, then it should check the correctness of the phone number (make sure it satisfies 123-3445 format).
    4. if the phone is incorrect it prints an error message and focus the phone input field.
    5. only if all data entered correctly this function sends information to the do_something.php script.

    To check the correctness of the data we need to come up with regular expressions for each field we need to check. Let's say that for these two fields we decided to use:

    var name_format = /^\w[\w ]*$/;
    var phone_format = /^[1-9]\d\d[\- ]\d\d\d\d$/;
    

    Since we do not want to extract any information from these fields, it's enough for us to use method test() to check the values in the text fields. We know how to make a small window with an error message, but what do we need to use to bring focus to the element whose data is incorrect. It turns out that JavaScript has a special method focus() of the input elements that does the job. Thus, if there is no match we need to do three things:

    1. print an error message
    2. bring focus to the element under consideration
    3. make sure we are not invoking do_something.php script (this step we need to do only if we defined the button as of submit type)
    We've discussed the first two steps to achieve the last one all we need to do is to use property returnValue of the event object. The following example shows how to check the phone field:
    function check_input()
    {
      var phone_format = /^[1-9]\d\d[\- ]\d\d\d\d$/;
      var msg = "";
      if( document.tform.phone.value == 0 )
        msg = "you need to enter a phone number.";
      else if( !phone_format.test(document.tform.phone.value) )
        msg = "please enter the correct phone (format: ###-####).";
      if( msg != "" ){
        alert("Error: "+msg);
    	document.tform.phone.focus();
    	event.returnValue = false; // do NOT submit the data
    	return false;
      }
      return true;
    }
    

    In the case of using simple button of the type button we do not need to use event.returnValue property, but we do need to actually submit the data (that is, send it to the server script) if everything is correct. To do that we need to use method submit() of a form element:

    function check_input()
    {
      ...
      // if everything is entered correctly
      document.tform.submit();
    }
    

    Another way to do the checking is called real-time checking. This method differs from the one describe above only that we do the checking for each element separately immediately after user completed entering data. To do that we need to define a handler for event Blur for the element we want to check:

    ...
    <input type=expdate onBlur="check_date();">
    ...
    
    This code assigns onBlur event handler, that is function check_date() gets control when this input element looses focus. The checking itself is completely identical.

    Here is an example that checks the data-entry before submitting the form. Please notice that we used real-time checking for the first and the second entry and batch validation checking for all of them. Please also notice that to check the first entry data the .onChange= handler was changed as far as .onBlur= event handler was set to check the date information.



    For additional information about regular expressions see official documentation.