Regular expressions functions in PHP.

PHP regular expression.

Regular expression symbols.

Regular expressions in PHP are very similar to ones in Javascript except that there are no special symbols (\b, \d, \s, \w, \B, \D, \S, \W) and syntax is a little bit different:
$reg = "^This";  // will match the word "This" at the beginning of a string
$Reg = "This$";  // will match the word "This" at the end of a string
$REG = "^This$"; // will match the string "This"
Here is the list of special characters you can use in PHP regular expressions:
Special Characters
matching matacharacters
Character Matches Example
. Any character except new line "..." matches "abC", "12f", "1+ ", or any three characters
[...] Character set "[AN]BC" matches "ABC" and "NBC" but not "BBC"
[^...] Negated character set "[^AN]BC" matches "BBC" and "CBC" but not "ABC" or "NBC"
matching character classes
Charater class Matches Example
[[:alpha:]] any letter "[[:alpha:]]+" matches "PHP", "JavaScript", and "text" but not "123"
[[:digit:]] any digit "[[:digit:]]" matches "12", "23", and "1text" but not "abc"
[[:alnum:]] any letter or digit "[[:alnum:]]" matches "PHP", "12", and "text21" but not "\n\t\n"
[[:space:]] any whitespace "[[:space:]]" matches "PHP is a script language", "JavaScript\tPHP", and "text\nanother text" but not "123"
[[:upper:]] any uppercase letter "[[:upper:]]" matches "PHP" and "JavaScript" but not "text"
[[:lower:]] any lowercase letter "[[:lower:]]" matches "JavaScript" and "text" but not "PHP"
[[:punct:]] any punctuation mark "[[:punct:]]" matches "PHP,Javascript", "JavaScript!", and "text:" but not "123"
[[:xdigit:]] any hexadecimal digit "[[:xdigit:]]" matches "A", "10Ae", and "123" but not "tst"
Counting metacharacters
Character Matches last character Example
* Zero or more times "Ja*vaScript" matches "JvaScript", "JavaScript", and "JaaaavaScript" but not "JuvaScript"
? Zero or one time "Ja?vaScript" matches "JvaScript" or "JavaScript" but not "JaavaScript"
+ One or more times "Ja+vaScript" matches "JavaScript" or "JaaaavaScript" but not "JvaScript"
{n} Exactly n times "Ja{2}vaScript" matches "JaavaScript" but not "JvaScript" or "JaaaavaScript"
{n,} n or more times "Ja{2,}vaScript" matches "JaavaScript" or "JaaaavaScript" but not "JvaScript"
{n, m} At least n at most m times "Ja{2,3}vaScript" matches "JaavaScript" or "JaaavaScript" but not "JvaScript" or "JaaaaavaScript"
positional metacharacters
Character Matches located Example
^ At the beginning of the string "^Fread" matches "Fred is OK" but not "I'm with Fred" or "Is Fred here?"
$ At the end of the string "Fread$" matches "I'm with Fred" but not "Fred is OK" or "Is Fred here?"

Regular expression search.

There are two PHP functions that allows us to search a string for a match:
  • ereg(pattern, source [,array])
  • eregi(pattern, source [,array])
    These function return a positive number (the length of the found match) if the pattern is found in the source, or an empty value (equivalent to false) if it's not found or an error has occurred.
    Example:
    $pattern = "^[a-zA-Z0-9_]+@[a-zA-Z0-9_]+(\\.[a-zA-Z0-9_]+)+";
    if( ereg($pattern, $email) ){
        echo "E-mail address is valid";
    }
    else{
        echo "Invalid e-mail address";
    }
    
    These two functions can accept a third argument. This optional argument is an array passed by reference. The very first element of the array is the found match and all the other element are the sets of symbols inside parenthesis in the pattern.
    Example:
    $pattern = "^([a-zA-Z0-9_])+@([a-zA-Z0-9_]+)\\.([a-zA-Z0-9_]+)$";
    if( ereg($pattern, $email, $arr) ){
        echo "E-mail address is valid";
        for($i=0;$arr[$i];$i++)
            echo "\$arr[$i] = $arr[$i]";
    }
    else{
        echo "Invalid e-mail address";
    }
    
    eregi() behaves identically to ereg(), except it ignores case distinctions when matching letters.

    Regular expression replace

    Functions
  • ereg_replace(pattern, replacement, string)
  • eregi_replace(pattern, replacement, string)
    search string for the given pattern and replace all occurrences with the replacement. If replacement took place it returns modified string, otherwise it returns the original string.
    $card = "1234-2345-5677-5675";
    $pattern = "[0-9]{4}";
    $Card = ereg_replace($pattern, "****", $card);
    echo "$card";        // stays the same
    echo "$Card";        // now equals ****-****-****-****
    
    As you can guess, eregi_replace() behaves like eregi_replace(), but ignores case distinction.

    The complete documentation can be found on the PHP site. Please also use this page to play with different regular expressions and PHP functions.

    Perl-style regular expressions in PHP.

    Besides PHP native regular expressions we have seen above PHP allows us to use Perl-style regular expressions (the ones JavaScript uses). There are set of functions to work with such regular expressions:
  • preg_math()
  • preg_math_all()
  • preg_replace()
  • preg_split()

    Search for a match.

    There are two functions whose only job is to search for a match. They are:
    int preg_match(pattern, string [, matcharray]);
    int preg_match_all(pattern, string, matcharray[, flag]);
    
    As a pattern these functions take Perl-style regular expressions; that is, expressions in the form /^start+/. The only difference is that in PHP these expressions should be put inside double quotes:
    $pregexep = "/^(\d{3}) \d{3}-\d{4}$/";
    
    Yes, we can use the same symbols:
    Meta-character Description
    \d any decimal digit
    \D any character that is not a decimal digit
    \s any whitespace character
    \S any character that is not a whitespace character
    \w any "word" character
    \W any "non-word" character
    \b word boundary
    \B not a word boundary
    We can also use modifiers
  • i to make pattern case-insensitive
  • m to make the functions search in multiple line string
  • and some other.
    Here is a short example: As we can see this functions returns true is a match was found, false otherwise.

    If the third (optional) argument is provided it will be filled with the result of the search. The first element of this array will contain the substring of the subject string that matches the pattern, the second argument will contain the substring of the match that matches the first sub-pattern in the pattern (a pattern inside the first parenthesis), and so on. Please notice that preg_match() stops searching after the first match is found. To find all matches in a string we need to use function preg_match_all().

    This functions takes the same arguments function preg_match() does, but the result array is a two-dimensional array. This array contains all substrings of the subject string that match the pattern as well as all sub-pattern matches. How exactly these elements are stored in the array depends on the value of the last (optional) parameter flag. We would prefer use value PREG_SET_ORDER for this argument. In this case each element of the result array is exactly an array returned by function preg_match(). The following example illustrates how to extract some opening and closing tags from an HTML file and also text between them:

    Regular expression replacement.

    To replace a regular expression match with another string we need to use function
    string preg_replace(pattern, replacement, string);
    
    The first argument of the function is the regular expression pattern to be replaced. The second argument is the string to replace pattern with, and the last argument is the string where we need to make the replacement. If the replacement took place preg_replace() returns the new string, otherwise it returns the original string.

    replacement argument may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form being the preferred one. Every such reference will be replaced by the text captured by the n'th parenthesized pattern. n can be from 0 to 99, and \\0 or $0 refers to the text matched by the whole pattern. Opening parentheses are counted from left to right (starting from 1) to obtain the number of the capturing subpattern.

    Using a regular expression as a delimiter.

    Sometimes we are not interested in regular expression matches themselves, but interested in the substrings between them. For example, we may want to get all words used in a sentence, but we do not know for sure how these words are separated, with spaces (how many?), tabs, commas, new lines, etc. In such situations we can use function
    array preg_split(pattern, string);
    
    This function takes two arguments
  • a regular expression pattern as a delimiter
  • and a string string to split
    and returns an array of substrings separated by this regular expression matches. Thus, the example we just talked about can be done like this:

    Note: some of the regular expression functions have more options than described in this lecture. Please consult PHP documentation about the complete description. The following Perl-style regular expression play ground to use your own regular expressions.

    Some string functions.

    Besides a set of very powerful regular expression functions PHP has a huge list of different string functions. We will shortly describe only some of them.
    Function Returns Description
    ltrim(string) String Strips whitespaces from the beginning of the specified string
    rtrim(string) String Removes trailing whitespaces
    chop(string) String Alias or rtrim(). Removes trailing white spaces from the specified string
    ord(string) Integer Returns the ASCII code of the first character of the specifid string
    chr(ascii) String Returns the character represented by the specified ASCII code
    strchr(haystack, needle) String Finds the first occurrence of the needle in the haystack
    strlen(string) Integer Returns the length of the specified string
    substr(string, start[, length]) String Returns length characters of the string from the position specified by start
    strpos(string, substr) Integer Returns the numeric position of the first occurrence of substr in string
    strrpos(string, substr) Integer Returns the numeric position of the last occurrence of substr in string
    stripcslashes(string) String Returns a string with backslashes stripped off. Recognizes C-like \n, \r ..., octal and hexadecimal representation.
    strtolower(string) String Returns string with all alphabetic characters converted to lowercase.
    strtoupper(string) String Returns string with all alphabetic characters converted to uppercase.
    nl2br (string) String Returns string with '<br />' inserted before all newlines.
    crypt(string[, salt]) String Encrypts the specified string using the two-character salt