A simple way to validate an entry is to compare it with a given constant, but this is obviously not a case here. We do knot know up front what user enters. Another way is applicable to numbers. If we expect to get a number, then we need to convert the entered string into a number and check the range. This methods is simple, but unfortunately it doesn't always work. Besides, this method doesn't fork for, for example, phone numbers. Phone number can be entered as:
A simple regular expression uses no special characters for defining the string to be used in a search it contains only a string you want you want to find in you test string. If we need to assign a pattern (rules) the test line should match we have to use special characters. For example, if we want to find out if my test string contains a phone number in the form ***-**** we need to use special character \d, which indicates that any digit matches this character but nothing but a digit, then my regular expression would be:
/\d\d\d-\d\d\d\d/;This regular expression requires that the test string has three any digits then dash symbol and then four more digits. The following table contains other special characters we can use in Perl regular expressions:
| matching metacharacters | ||
| Character | Matches | Example |
|---|---|---|
| \b | Word boundary |
/\bor/ matches "origami" and "or" but not "normal" /or\b/ matches "traitor" and "or" but not "perform" /\bor\b/ matches "or" and nothing else |
| \B | Word nonboundary |
/\Bor/ matches "normal" but not "origami" /or\B/ matches "normal" and "origami" but not "traitor" /\Bor|B/ matches "normal" but not "origami" or "traitor" |
| \d | Numeral 0 through 9 | /\d\d\d/ matches "212" and "415" but not "B17" or "ABC" |
| \D | Nonnumeral | /\D\D\D/ matches "ABC" and "GEF" but not "B17" or "123" |
| \s | Single white space | /over\sbite/ matches "over bite" but not "overbite" or "over bite" |
| \S | Single nonwhite space | /over\Sbite/ matches "over-bite" but not "overbite" or "over bite" |
| \w | Letter, numeral, or underscore | /A\w/ matches "A1" and "AC" but not "A+" |
| \W | Non letter, numeral, or underscore | /A\W/ matches "A+" but not "AC" and "A2" |
| \A | At the beginning of the string | /\AFread/ matches "Fred is OK" but not "I'm with Fred" or "Is Fred here?" |
| \Z | At the end of the string or before newline at the end | /Fread\Z/ matches "I'm with Fred\n and Bob" but not "Fred is OK" or "Is Fred here?" |
| \z | At the end of the string | /Fread\z/ matches "I'm with Fred" but not "Fred is OK" or "Is Fred here?" |
| . | Any character except new line | /.../ matches "abC", "12f", "1+ ", or ant three characters |
| [...] | Character set | /[AN]BC/ matches "ABC" and "NBC" but not "BBC" |
| [^...] | Negated character set | /[^AN]BC/ matches "BBC" and "CBC" but not "ABC" or "NBC" |
| Counting metacharacters | ||
| Character | Matches last character | Example |
| * | Zero or more times | /Ja*vaScript/ matches "JvaScript", "JavaScript", and "JaaaavaScript" but not "JuvaScript" |
| ? | Zero or one time | /Ja?vaScript/ matches "JvaScript" or "JavaScript" but not "JaavaScript" |
| + | One or more times | /Ja+vaScript/ matches "JavaScript" or "JaaaavaScript" but not "JvaScript" |
| {n} | Exactly n times | /Ja{2}vaScript/ matches "JaavaScript" but not "JvaScript" or "JaaaavaScript" |
| {n,} | n or more times | /Ja{2,}vaScript/ matches "JaavaScript" or "JaaaavaScript" but not "JvaScript" |
| {n, m} | At least n at most m times | /Ja{2,3}vaScript/ matches "JaavaScript" or "JaaavaScript" but not "JvaScript" or "JaaaaavaScript" |
| positional metacharacters | ||
| Character | Matches located | Example |
| ^ | At the beginning of the string | /^Fread/ matches "Fred is OK" but not "I'm with Fred" or "Is Fred here?" |
| $ | At the end of the string or before newline at the end | /Fread$/ matches "I'm with Fred\n and Bob" but not "Fred is OK" or "Is Fred here?" |
For example if you want to make sure that a match for a Roman numeral is found only when it is at the start of a line and has a dot after it you check for the match
/^[IVXMDC]+\./
Not to be confused with the metacharacters listed in the table above are escaped string characters for
| Symbol | Escape symbol | Description |
|---|---|---|
| tab | \t | Tabulation symbol |
| newline | \n | New line symbol |
| return | \r | carriage return |
| formfeed | \f | Formfeed symbol (printer command) |
| vtab | \v | Vertical tabulation symbol |
| . | \. | Dot |
| ^ | \^ | Caret symbol |
| $ | \$ | Dollar sign |
| \ | \\ | Backslash |
| / | \/ | Slash |
| - | \- | Dash |
| ( | \( | Open parenthesis |
| ) | \) | Close parenthesis |
$str = "My phone number is 123-3445. This is my home phone.";
if( $str =~ /\d{3}-\d{4}/ ){
print "There is a phone number in the string '$str'\n";
}
else{
print "There is not a phone number in the string '$str'\n";
}
As you can see the character m may be omitted (m stays for "match").
We also often need to know not only if there is match or not, but also what is the substring that matches the pattern. Perl provides several special variables for that purpose:
$str = "My phone number is 123-3445. This is my home phone.";
if( $str =~ /\d{3}-\d{4}/ ){
print "There is a phone number in the string '$str'\n";
print "The phone is: $&\n";
print " Before: '$`'\n";
print " After: '$''\n";
}
else{
print "There is not a phone number in the string '$str'\n";
}
If we want to find all matches in a string we can use operator =~ in a loop and setting the value of the variable $str to the substring on the right of the match:
$str = "My phones: 123-3456 (home), 234-4557 (office), 456-4564 (cell).";
while( $str =~ /\d{3}-\d{4}/ ){
print "The phone is: $&\n";
$str = $';
}
or we can use the global modifier g and use operator =~ in the list context.
If used on the right side of the assignment operator operator =~ returns an array of matches:
$str = "My phones: 123-3456 (home), 234-4557 (office), 456-4564 (cell).";
@phones = $str =~ /\d{3}-\d{4}/g;
foreach $phone (@phones){
print "$phone\n";
}
In addition to operator =~ Perl has operator !~ that returns true if there is no match and false otherwise.
For example, if we are checking that a date was entered in either in "mm/dd/yyyy" or
"mm-dd-yyyy" format and also need to know the values of the month, day, and year we can use the following
regular expression:
$today = "1/24/2003";
if( $today =~ /\b(1[0-2]|0?[1-9])[\-\/](0?[1-9]|[12][0-9]|3[01])[\-\/]((19|20)\d{2})/ ){
print "Date: $&\n";
print "Month: $1\n";
print "Day: $2\n";
print "Year: $3\n";
print "Century: $4\n";
}
Let's take a closer look at this expression:
$card = "6432 23452342-2342";
if( $card =~ /(\d\d\d\d)[\-\s]?(\d\d\d\d)[\-\s]?(\d\d\d\d)[\-\s]?(\d\d\d\d)/ ){
$card = "$1-$2-$3-$4";
}
else{
$card = "Invalid credit card number!";
}
To replace a part of a string that matches a regular expression we can use s/// regular expression. This expression includes a pattern (goes between the first and second slash) and a string to substitute with (goes between the second and the third slash). Operator =~ performs the substitution if used with s/// regular expression. Thus, the following example substitutes the first 4 digits in a credit card number with stars:
$card = "6432 23452342-2342";
if( $card =~ s/\d{4}/****/ ){
print "$card\n";
}
else{
print "Sorry, there is no match\n";
}
If used in the modifier g such regular expression substitutes all matches in the string.
In the following example we first bring the card number in the normal form and then substitute all digits
but the last 4 with stars:
$card = "6432 23452342-2342";
$card =~ s/(\d\d\d\d)[\-\s]?/$1-/g; # separate 4-digit groups with dashes
$card =~ s/-$//; # remove the trailing dash
$card =~ s/\d{4}-/****-/g; # substitute 4-digit groups with 4 stars
print "$card\n";
Please consult Perl regular expression documentations for more details.