DESCRIPTION This 'cheat sheet' is a handy reference, meant for beginning Perl programmers. Not everything is mentioned, but 195 features may already be overwhelming. A regular expression is a string of characters that define the pattern or patterns you are viewing. The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, grep, and awk. Perl Reference Card This is version 2 of the perl reference card. 6 Regular Expressions. M/pattern/igmsoxc matching pattern qr/pattern/imsox store regex in variable s/pattern/replacement/igmsoxe search and replace Modifiers: i case-insensitive o compile once g global x extended m multiline c don't reset pos (with g).
- Php Regular Expression Cheat Sheet
- Perl Regular Expression Cheat Sheet
- Regular Expressions In Perl
- Perl Regular Expression Pdf
Perl Regular Expressions
Table of Contents
A cheat sheet or crib sheet is a concise set of notes used for quick reference. 'Cheat sheet' may also be rendered 'cheatsheet'. In the graphics world, cheats sheets are there to help the designers in completing their design applications easily. At times, designers get stuck with the design process and they do not know where to get help from. Regular expressions in Perl This document presents a tabular summary of the regular expression (regexp) syntax in Perl, then illustrates it with a collection of annotated examples. Char meaning ^ beginning of string $ end of string. Any character except newline.
1. Overview
Perl is a scripting language widely used for system administration andprogramming on the World Wide Web. It originated in the UNIX communityand has a strong UNIX slant, but is still very useful for Win32 platforms.perl (small 'p') is the program used to interpret the Perl language.
2. Introduction to Regular Expression.
Regular Expression is a simple string that must match the text exactly.The string can contain special characters which has different or specialmeaning. These characters are not treated as a usual character and theyare not matched literally. These characters denote the string has moregeneric pattern.
Special characters which makes the pattern more generic are:
These special characters are widely used to interpret the patterns.Usage of these characters depends on the occurrence of the pattern. Thereis no limitations in using these.
Using Regular Expression, searching a pattern in the text becomes easy.The search done with regular expression is called potential search.
Regular Expression is represented in-between 2 forward slash ( '/' )character.
3. Literal Pattern.
Literal pattern is a string which contains no special characters. Aliteral pattern matches an identical string, but no other characters. Thesepatterns will not contain any RegEx defined operator to search.
Example:
a. PERL Regular Expression
b. Pattern matching language.
These are the simple examples of Literal patterns. These are like searchinga word or string in any text editor.
4. Character Sets.
Defining a list of characters pertaining to the pattern is called acharacter set. There are many types of character sets. Each one has a specialmeaning. When the search engine looks these character sets it matches acharacter it is specified in the character list.
Character sets are always enclosed in square brackets ( [ ] ).
Example:
5. Range.
Range is a short form interpreting a list of character. The range isalways specified by the character hyphen ( - ).
Example:
6. Any Character.
Class of character or character set specifies the list of charactersto match. Regular expression compiler will match only the characters listed.But when we need to match any character we need to use the operator dot( . )
Dot tell the compiler to match any character.
Example:
/.at/ which match all of the following.
1. Bat
2. Cat
3. Eat
4. Fat
5. Rat
Dot is a simple notation to match any character.
Dot will not match NewLine ( n ), Return Character ( r ), Line Feed( f ) & NULL character ( 0 ).
7. Grouping
A series of patterns or characters are grouped to a single element orpattern is called grouping. Grouped elements can be reproduced when evernecessary. This helps us to cut a specific pattern from a text and reproduceor paste at the appropriate place.
The grouping operator is parenthesis [ ( ) ].
Characters enclosed in the parenthesis are grouped to single elementand stored in a variable. The variables are named according to their occurrence.1st grouped element is stored in the variable $1, second in $2 and so on.
Example:
1. RegEx: /This is ([0-9]) testing/ Source Text: /This is 1 testing/
The above RegEx will match the text and store the number 1 in $1 variable.
2. RegEx: /456 (ULRA) 73/ Source Text: /This is sample text with 456ULRA 73.
The above RegEx will match the text and store ULRA in $1 variable.
There can be any number of grouping. Each grouping is stored in differentvariables.
8. Back references & Extraction.
Grouped elements can be rematched literally using back references. Theseback references help matching the grouped elements in the same expressionto achieve the results.
Back reference is represented with the grouping number preceded withbackslash.
1 - Represents back referencing 1st grouping.
2 - Represents back referencing 2nd grouping.
Example 1:
RegEx: /([0-9]) 1 ([0-9])/
Source Text: /1 1 3/
In the above example each variable will have.
$1 = 1
$2 = 3
It will literally match the 1st variable.
Example 2:
RegEx: /([0-9]) 1 ([0-9])/
Source Text: /1 2 3/
The above RegEx will not match because $1 will have value 1 and it willsearch for 1 subsequently which is not present.
Back referenced text will not be stored in a different variable.
9. Optional Expressions.
A part of an pattern can be made optional in a regular expression witha ? operator.
Example:
RegEx: /[0-9]? This is sample/
Source Text1: /1 This is sample/
Source Text2: / This is sample/
Above regular expression will match both source text1 and source text2.
10. Counted Expressions.
An interval expression, {m,n} where 'm' and 'n' are non-negative integerswith 'n >= m', applies to the proceeding character, character set, subexpressionor backreference. It indicates that the preceeding element must match atleast 'm' times and may match as many as 'n' times.
Example:
RegEx: /cat{1,4}/
Source Text: catttt.
Above regular expression will match the full text. The expression {1,4} says that the pattern should match at least once and maximum of 4 times.
Types of Counted expressions
1. {n} Matches exactly n times.
2. {n,} Matches at least n times.
3. {n,m} Matches at least n but not more than m times.
11. Alternative Expressions.
Alternative expression is a one which matches any of the specified listof patterns. This helps us to give OR conditions in our patterns.
Example:
RegEx: /(TEXT|text)/
Php Regular Expression Cheat Sheet
Source Text1: This is sample TEXT.
Source Text2: This is sample text.
Regular expression will match both source text1 and source text2 becauseof alternative expressions. It will match either TEXT or text.
12. Repeated Expressions.
To match a part of a pattern repeatedly for many times. It is just likecounted patterns but here it is more generic.
Operators used in Repeated Expressions:
1. * ( Asterix ) - Represents 0 or many times of matching.
2. + ( Plus ) - Represents 1 or many times of matching.
Operator * represents that the pattern is optional and it can come anytimes.
Operator + represents that the pattern is mandatory or must and it cancome any times.
Example 1:
RegEx: /[0-9]+/
Source Text1: 123 This is a sample text.
Result: It will match '123'.
Example 1:
RegEx: /This is a [0-9]*/
Source Text1: 123 This is a sample text.
Result: It will match 'This is a ' because [0-9] is optional.
13. Short Cut Notations.
Perl provides lot of short cut notations to write regular expressions.These short cut notations help us to understand the regex easily and writesmaller regular expressions.
List of short cut notations.
1. w - Match a 'word' character ( alphanumeric & _ )
2. W - Match a non-word character.
3. s - Match a whitespace character. ( Tab ( t ), NewLine ( n ),Return ( r ) & space )
4. S - Match a non-whitespace character.
5. d - Match a digit character.
6. D - Match a non-digit character.
These short cut notations can be used inside character classes also.To match repeatedly use repeated expressions.
Example:
1. RegEx: /[w]+/
This will match a word.
2. RegEx: /[^w]+/
This will match other than a alphanumeric & _ character.
14. Miscellaneous Information.
^ operator tells the compiler to match the text from the beginning ofa line.
$ operator tells the compiler to match the text from end of the line.
15. Summary
1. Literal matching. /Text/
2. Character Sets. /[a-z]/
3. Range /[0-9]/
4. Any character /./
5. Grouping / ( [0-9]+ )/
6. Back references / ([0-9]+) 1 /
7. Optional Expression /[0-9]?/
8. Counted Expression /([0-9]){1,4}/
9. Alternative Expression /(TEXT|text)/
10. * - Zero or many times.
11. + - One or many times.
16. Quick Reference Guide
Regular Expression
Each character matches itself, unless it is one of the special characters+?.*$()[]{}|. The special meaning of these characters can be escaped usinga ‘’.
. matches an arbitrary character, but not a newline unless it is a single-linematch (see m//s).
(...) groups a series of pattern elements to a single element. matchesthe beginning of the target. In multi-line mode (see m//m)also matchesafter every newline character.
$ matches the end of the line. In multi-line mode also matches beforeevery newline character.
[...] denotes a class of characters to match. [...] negates the class.
(...|...|...) matches one of the alternatives.
(?# TEXT ) Comment.
(?: REGEXP ) Like (REGEXP) but does not make back-references.
(?= REGEXP ) Zero width positive look-ahead assertion.
(?! REGEXP ) Zero width negative look-ahead assertion.
(? MODIFIER ) Embedded pattern-match modifier. MODIFIER can be one ormore of i, m, s or x. Quantified subpatterns match as many times as possible.When followed with a ‘?’ they match the minimum number of times. Theseare the quantifiers:
+ matches the preceding pattern element one or more times.
? matches zero or one times.
* matches zero or more times.
{N,M} denotes the minimum N and maximum M match count. {N} means exactlyN times; {N,} means at least N times.
A ‘’ escapes any special meaning of the following character if non-alphanumeric,but it turns most alphanumeric characters into something special:
w matches alphanumeric, including ‘_’, W matches non-alphanumeric.
s matches whitespace, S matches non-whitespace.
d matches numeric, D matches non-numeric.
A matches the beginning of the string, Z matches the end.
b matches word boundaries, B matches non-boundaries.
G matches where the previous m//g search left off.
n, r, f, t etc. have their usual meaning.
w, s and d may be used within character classes, b denotes backspacein this context.
Back-references:
1...9 refer to matched sub-expressions, grouped with (), inside thematch. 10 and up can also be used if the pattern matches that many sub-expressions.See also $1...$9, $+, $&, $‘ and $’ in section ‘Special variables’.With modifier x, whitespace can be used in the patterns for readabilitypurposes.
Search & Replace
[ EXPR =˜ ][m]/PATTERN/ [ g ][i][m][o][s][x]
Searches EXPR (default: $_) for a pattern. If you prepend an m you canuse almost any pair of delimiters instead of the slashes. If used in arraycontext, an array is returned consisting of the sub-expressions matchedby the parentheses in pattern, i.e. ($1,$2,$3,...).
Optional modifiers: g matches as many times as possible; i searchesin a case-insensitive manner; o interpolates variables only once. m treatsthe string as multiple lines; s treats the string as a single line; x allowsfor regular expression extensions.
If PATTERN is empty, the most recent pattern from a previous match orreplacement is used. With g the match can be used as an iterator in scalarcontext.
?PATTERN?
This is just like the /PATTERN/ search, except that it matches onlyonce between calls to the reset operator.
[ $VAR =˜ ] s/PATTERN/REPLACEMENT/ [ e ][g][i][m][o][s][x]
Searches a string for a pattern, and if found, replaces that patternwith the replacement text. It returns the number of substitutions made,if any, otherwise it returns false.
Optional modifiers: g replaces all occurrences of the pattern; e evaluatesthe replacement string as a Perl expression; for the other modifiers, see/PATTERN/ matching. Almost any delimiter may replace the slashes; if singlequotes are used, no interpolation is done on the strings between the delimiters,otherwise they are interpolated as if inside double quotes. If bracketingdelimiters are used, PATTERN and REPLACEMENT may have their own delimiters,e.g. s(foo)[bar].
If PATTERN is empty, the most recent pattern from a previous match orreplacement is used.
[ $VAR =˜ ] tr/SEARCHLIST/REPLACEMENTLIST/ [ c ][d][s]
Translates all occurrences of the characters found in the search listwith the corresponding character in the replacement list. It returns thenumber of characters replaced. y may be used instead of tr.
Optional modifiers: c complements the SEARCHLIST; d deletes all charactersfound in SEARCHLIST that do not have a corresponding character in REPLACEMENTLIST;s squeezes all sequences of characters that are translated into the sametarget character into one occurrence of this character.
pos SCALAR
Returns the position where the last m//g search left off for SCALAR.Maybe assigned to.
study [ $VARy ]
Studies the scalar variable $VAR in anticipation of performing manypattern matches on its contents before the variable is next modified.
The tables below are a reference to basic regex. While reading the rest of the site, when in doubt, you can always come back and look here. (It you want a bookmark, here's a direct link to the regex reference tables). I encourage you to print the tables so you have a cheat sheet on your desk for quick reference.The tables are not exhaustive, for two reasons. First, every regex flavor is different, and I didn't want to crowd the page with overly exotic syntax. For a full reference to the particular regex flavors you'll be using, it's always best to go straight to the source. In fact, for some regex engines (such as Perl, PCRE, Java and .NET) you may want to check once a year, as their creators often introduce new features.
The other reason the tables are not exhaustive is that I wanted them to serve as a quick introduction to regex. If you are a complete beginner, you should get a firm grasp of basic regex syntax just by reading the examples in the tables. I tried to introduce features in a logical order and to keep out oddities that I've never seen in actual use, such as the 'bell character'. With these tables as a jumping board, you will be able to advance to mastery by exploring the other pages on the site.
How to use the tables
The tables are meant to serve as an accelerated regex course, and they are meant to be read slowly, one line at a time. On each line, in the leftmost column, you will find a new element of regex syntax. The next column, 'Legend', explains what the element means (or encodes) in the regex syntax. The next two columns work hand in hand: the 'Example' column gives a valid regular expression that uses the element, and the 'Sample Match' column presents a text string that could be matched by the regular expression.You can read the tables online, of course, but if you suffer from even the mildest case of online-ADD (attention deficit disorder), like most of us… Well then, I highly recommend you print them out. You'll be able to study them slowly, and to use them as a cheat sheet later, when you are reading the rest of the site or experimenting with your own regular expressions.
Enjoy!
If you overdose, make sure not to miss the next page, which comes back down to Earth and talks about some really cool stuff: The 1001 ways to use Regex.
Regex Accelerated Course and Cheat Sheet
For easy navigation, here are some jumping points to various sections of the page:✽ Characters
✽ Quantifiers
✽ More Characters
✽ Logic
✽ More White-Space
✽ More Quantifiers
✽ Character Classes
✽ Anchors and Boundaries
✽ POSIX Classes
✽ Inline Modifiers
✽ Lookarounds
✽ Character Class Operations
✽ Other Syntax
(direct link)
Characters
Character | Legend | Example | Sample Match |
---|---|---|---|
d | Most engines: one digit from 0 to 9 | file_dd | file_25 |
d | .NET, Python 3: one Unicode digit in any script | file_dd | file_9੩ |
w | Most engines: 'word character': ASCII letter, digit or underscore | w-www | A-b_1 |
w | .Python 3: 'word character': Unicode letter, ideogram, digit, or underscore | w-www | 字-ま_۳ |
w | .NET: 'word character': Unicode letter, ideogram, digit, or connector | w-www | 字-ま‿۳ |
s | Most engines: 'whitespace character': space, tab, newline, carriage return, vertical tab | asbsc | a b c |
s | .NET, Python 3, JavaScript: 'whitespace character': any Unicode separator | asbsc | a b c |
D | One character that is not a digit as defined by your engine's d | DDD | ABC |
W | One character that is not a word character as defined by your engine's w | WWWWW | *-+=) |
S | One character that is not a whitespace character as defined by your engine's s | SSSS | Yoyo |
(direct link)
Quantifiers
Quantifier | Legend | Example | Sample Match |
---|---|---|---|
+ | One or more | Version w-w+ | Version A-b1_1 |
{3} | Exactly three times | D{3} | ABC |
{2,4} | Two to four times | d{2,4} | 156 |
{3,} | Three or more times | w{3,} | regex_tutorial |
* | Zero or more times | A*B*C* | AAACC |
? | Once or none | plurals? | plural |
Perl Regular Expression Cheat Sheet
(direct link)
More Characters
Regular Expressions In Perl
Character | Legend | Example | Sample Match |
---|---|---|---|
. | Any character except line break | a.c | abc |
. | Any character except line break | .* | whatever, man. |
. | A period (special character: needs to be escaped by a ) | a.c | a.c |
Escapes a special character | .*+? $^/ | .*+? $^/ | |
Escapes a special character | [{()}] | [{()}] |
(direct link)
Logic
Logic | Legend | Example | Sample Match |
---|---|---|---|
| | Alternation / OR operand | 22|33 | 33 |
( … ) | Capturing group | A(nt|pple) | Apple (captures 'pple') |
1 | Contents of Group 1 | r(w)g1x | regex |
2 | Contents of Group 2 | (dd)+(dd)=2+1 | 12+65=65+12 |
(?: … ) | Non-capturing group | A(?:nt|pple) | Apple |
(direct link)
More White-Space
Character | Legend | Example | Sample Match |
---|---|---|---|
t | Tab | Ttw{2} | T ab |
r | Carriage return character | see below | |
n | Line feed character | see below | |
rn | Line separator on Windows | ABrnCD | AB CD |
N | Perl, PCRE (C, PHP, R…): one character that is not a line break | N+ | ABC |
h | Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator | ||
H | One character that is not a horizontal whitespace | ||
v | .NET, JavaScript, Python, Ruby: vertical tab | ||
v | Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator | ||
V | Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace | ||
R | Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by v) |
(direct link)
More Quantifiers
Quantifier | Legend | Example | Sample Match |
---|---|---|---|
+ | The + (one or more) is 'greedy' | d+ | 12345 |
? | Makes quantifiers 'lazy' | d+? | 1 in 12345 |
* | The * (zero or more) is 'greedy' | A* | AAA |
? | Makes quantifiers 'lazy' | A*? | empty in AAA |
{2,4} | Two to four times, 'greedy' | w{2,4} | abcd |
? | Makes quantifiers 'lazy' | w{2,4}? | ab in abcd |
(direct link)
Character Classes
Character | Legend | Example | Sample Match |
---|---|---|---|
[ … ] | One of the characters in the brackets | [AEIOU] | One uppercase vowel |
[ … ] | One of the characters in the brackets | T[ao]p | Tap or Top |
- | Range indicator | [a-z] | One lowercase letter |
[x-y] | One of the characters in the range from x to y | [A-Z]+ | GREAT |
[ … ] | One of the characters in the brackets | [AB1-5w-z] | One of either: A,B,1,2,3,4,5,w,x,y,z |
[x-y] | One of the characters in the range from x to y | [ -~]+ | Characters in the printable section of the ASCII table. |
[^x] | One character that is not x | [^a-z]{3} | A1! |
[^x-y] | One of the characters not in the range from x to y | [^ -~]+ | Characters that are not in the printable section of the ASCII table. |
[dD] | One character that is a digit or a non-digit | [dD]+ | Any characters, inc- luding new lines, which the regular dot doesn't match |
[x41] | Matches the character at hexadecimal position 41 in the ASCII table, i.e. A | [x41-x45]{3} | ABE |
(direct link)
Anchors and Boundaries
Anchor | Legend | Example | Sample Match |
---|---|---|---|
^ | Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means 'not') | ^abc .* | abc (line start) |
$ | End of string or end of line depending on multiline mode. Many engine-dependent subtleties. | .*? the end$ | this is the end |
A | Beginning of string (all major engines except JS) | Aabc[dD]* | abc (string... ...start) |
z | Very end of the string Not available in Python and JS | the endz | this is...n...the end |
Z | End of string or (except Python) before final line break Not available in JS | the endZ | this is...n...the endn |
G | Beginning of String or End of Previous Match .NET, Java, PCRE (C, PHP, R…), Perl, Ruby | ||
b | Word boundary Most engines: position where one side only is an ASCII letter, digit or underscore | Bob.*bcatb | Bob ate the cat |
b | Word boundary .NET, Java, Python 3, Ruby: position where one side only is a Unicode letter, digit or underscore | Bob.*bкошкаb | Bob ate the кошка |
B | Not a word boundary | c.*BcatB.* | copycats |
(direct link)
POSIX Classes
Character | Legend | Example | Sample Match |
---|---|---|---|
[:alpha:] | PCRE (C, PHP, R…): ASCII letters A-Z and a-z | [8[:alpha:]]+ | WellDone88 |
[:alpha:] | Ruby 2: Unicode letter or ideogram | [[:alpha:]d]+ | кошка99 |
[:alnum:] | PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z | [[:alnum:]]{10} | ABCDE12345 |
[:alnum:] | Ruby 2: Unicode digit, letter or ideogram | [[:alnum:]]{10} | кошка90210 |
[:punct:] | PCRE (C, PHP, R…): ASCII punctuation mark | [[:punct:]]+ | ?!.,:; |
[:punct:] | Ruby: Unicode punctuation mark | [[:punct:]]+ | ‽,:〽⁆ |
(direct link)
Inline Modifiers
None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m).Modifier | Legend | Example | Sample Match |
---|---|---|---|
(?i) | Case-insensitive mode (except JavaScript) | (?i)Monday | monDAY |
(?s) | DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (rn). Also known as 'single-line mode' because the dot treats the entire input as a single line | (?s)From A.*to Z | From A to Z |
(?m) | Multiline mode (except Ruby and JS) ^ and $ match at the beginning and end of every line | (?m)1rn^2$rn^3$ | 1 2 3 |
(?m) | In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks | (?m)From A.*to Z | From A to Z |
(?x) | Free-Spacing Mode mode (except JavaScript). Also known as comment mode or whitespace mode | (?x) # this is a # comment abc # write on multiple # lines [ ]d # spaces must be # in brackets | abc d |
(?n) | .NET, PCRE 10.30+: named capture only | Turns all (parentheses) into non-capture groups. To capture, use named groups. | |
(?d) | Java: Unix linebreaks only | The dot and the ^ and $ anchors are only affected by n | |
(?^) | PCRE 10.32+: unset modifiers | Unsets ismnx modifiers |
(direct link)
Lookarounds
Lookaround | Legend | Example | Sample Match |
---|---|---|---|
(?=…) | Positive lookahead | (?=d{10})d{5} | 01234 in 0123456789 |
(?<=…) | Positive lookbehind | (?<=d)cat | cat in 1cat |
(?!…) | Negative lookahead | (?!theatre)thew+ | theme |
(?<!…) | Negative lookbehind | w{3}(?<!mon)ster | Munster |
(direct link)
Character Class Operations
Class Operation | Legend | Example | Sample Match |
---|---|---|---|
[…-[…]] | .NET: character class subtraction. One character that is in those on the left, but not in the subtracted class. | [a-z-[aeiou]] | Any lowercase consonant |
[…-[…]] | .NET: character class subtraction. | [p{IsArabic}-[D]] | An Arabic character that is not a non-digit, i.e., an Arabic digit |
[…&&[…]] | Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class. | [S&&[D]] | An non-whitespace character that is a non-digit. |
[…&&[…]] | Java, Ruby 2+: character class intersection. | [S&&[D]&&[^a-zA-Z]] | An non-whitespace character that a non-digit and not a letter. |
[…&&[^…]] | Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class | [a-z&&[^aeiou]] | An English lowercase letter that is not a vowel. |
[…&&[^…]] | Java, Ruby 2+: character class subtraction | [p{InArabic}&&[^p{L}p{N}]] | An Arabic character that is not a letter or a number |
(direct link)
Perl Regular Expression Pdf
Other Syntax
Syntax | Legend | Example | Sample Match |
---|---|---|---|
Keep Out Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned | prefixKd+ | 12 | |
Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters. | Q(C++ ?)E | (C++ ?) |
and The Best Regex Trick Ever!!!
The 1001 ways to use Regex