Perl Regular Expression Cheat Sheet

DESCRIPTION This 'cheat sheet' is a handy reference, meant for beginning Perl programmers. Not everything is mentioned, but 195 features may already be overwhelming. A regular expression is a string of characters that define the pattern or patterns you are viewing. The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.supporting programs, such as sed, grep, and awk. Perl Reference Card This is version 2 of the perl reference card. 6 Regular Expressions. M/pattern/igmsoxc matching pattern qr/pattern/imsox store regex in variable s/pattern/replacement/igmsoxe search and replace Modifiers: i case-insensitive o compile once g global x extended m multiline c don't reset pos (with g).

Php Regular Expression Cheat Sheet
Perl Regular Expression Cheat Sheet
Regular Expressions In Perl
Perl Regular Expression Pdf

Perl Regular Expressions

A cheat sheet or crib sheet is a concise set of notes used for quick reference. 'Cheat sheet' may also be rendered 'cheatsheet'. In the graphics world, cheats sheets are there to help the designers in completing their design applications easily. At times, designers get stuck with the design process and they do not know where to get help from. Regular expressions in Perl This document presents a tabular summary of the regular expression (regexp) syntax in Perl, then illustrates it with a collection of annotated examples. Char meaning ^ beginning of string $ end of string. Any character except newline.

1. Overview

Perl is a scripting language widely used for system administration andprogramming on the World Wide Web. It originated in the UNIX communityand has a strong UNIX slant, but is still very useful for Win32 platforms.perl (small 'p') is the program used to interpret the Perl language.

2. Introduction to Regular Expression.

Regular Expression is a simple string that must match the text exactly.The string can contain special characters which has different or specialmeaning. These characters are not treated as a usual character and theyare not matched literally. These characters denote the string has moregeneric pattern.

Special characters which makes the pattern more generic are:

These special characters are widely used to interpret the patterns.Usage of these characters depends on the occurrence of the pattern. Thereis no limitations in using these.

Using Regular Expression, searching a pattern in the text becomes easy.The search done with regular expression is called potential search.

Regular Expression is represented in-between 2 forward slash ( '/' )character.

3. Literal Pattern.

Literal pattern is a string which contains no special characters. Aliteral pattern matches an identical string, but no other characters. Thesepatterns will not contain any RegEx defined operator to search.

Example:

a. PERL Regular Expression

b. Pattern matching language.

These are the simple examples of Literal patterns. These are like searchinga word or string in any text editor.

4. Character Sets.

Defining a list of characters pertaining to the pattern is called acharacter set. There are many types of character sets. Each one has a specialmeaning. When the search engine looks these character sets it matches acharacter it is specified in the character list.

Character sets are always enclosed in square brackets ( [ ] ).

Example:

5. Range.

Range is a short form interpreting a list of character. The range isalways specified by the character hyphen ( - ).

Example:

6. Any Character.

Class of character or character set specifies the list of charactersto match. Regular expression compiler will match only the characters listed.But when we need to match any character we need to use the operator dot( . )

Dot tell the compiler to match any character.

Example:

/.at/ which match all of the following.

1. Bat

2. Cat

3. Eat

4. Fat

5. Rat

Dot is a simple notation to match any character.

Dot will not match NewLine ( n ), Return Character ( r ), Line Feed( f ) & NULL character ( 0 ).

7. Grouping

A series of patterns or characters are grouped to a single element orpattern is called grouping. Grouped elements can be reproduced when evernecessary. This helps us to cut a specific pattern from a text and reproduceor paste at the appropriate place.

The grouping operator is parenthesis [ ( ) ].

Characters enclosed in the parenthesis are grouped to single elementand stored in a variable. The variables are named according to their occurrence.1st grouped element is stored in the variable $1, second in $2 and so on.

Example:

1. RegEx: /This is ([0-9]) testing/ Source Text: /This is 1 testing/

The above RegEx will match the text and store the number 1 in $1 variable.

2. RegEx: /456 (ULRA) 73/ Source Text: /This is sample text with 456ULRA 73.

The above RegEx will match the text and store ULRA in $1 variable.

There can be any number of grouping. Each grouping is stored in differentvariables.

8. Back references & Extraction.

Grouped elements can be rematched literally using back references. Theseback references help matching the grouped elements in the same expressionto achieve the results.

Back reference is represented with the grouping number preceded withbackslash.

1 - Represents back referencing 1st grouping.

2 - Represents back referencing 2nd grouping.

Example 1:

RegEx: /([0-9]) 1 ([0-9])/

Source Text: /1 1 3/

In the above example each variable will have.

$1 = 1

$2 = 3

It will literally match the 1st variable.

Example 2:

RegEx: /([0-9]) 1 ([0-9])/

Source Text: /1 2 3/

The above RegEx will not match because $1 will have value 1 and it willsearch for 1 subsequently which is not present.

Back referenced text will not be stored in a different variable.

9. Optional Expressions.

A part of an pattern can be made optional in a regular expression witha ? operator.

Example:

RegEx: /[0-9]? This is sample/

Source Text1: /1 This is sample/

Source Text2: / This is sample/

Above regular expression will match both source text1 and source text2.

10. Counted Expressions.

An interval expression, {m,n} where 'm' and 'n' are non-negative integerswith 'n >= m', applies to the proceeding character, character set, subexpressionor backreference. It indicates that the preceeding element must match atleast 'm' times and may match as many as 'n' times.

Example:

RegEx: /cat{1,4}/

Source Text: catttt.

Above regular expression will match the full text. The expression {1,4} says that the pattern should match at least once and maximum of 4 times.

Types of Counted expressions

1. {n} Matches exactly n times.

2. {n,} Matches at least n times.

3. {n,m} Matches at least n but not more than m times.

11. Alternative Expressions.

Alternative expression is a one which matches any of the specified listof patterns. This helps us to give OR conditions in our patterns.

Example:

RegEx: /(TEXT|text)/

Php Regular Expression Cheat Sheet

Source Text1: This is sample TEXT.

Source Text2: This is sample text.

Regular expression will match both source text1 and source text2 becauseof alternative expressions. It will match either TEXT or text.

12. Repeated Expressions.

To match a part of a pattern repeatedly for many times. It is just likecounted patterns but here it is more generic.

Operators used in Repeated Expressions:

1. * ( Asterix ) - Represents 0 or many times of matching.

2. + ( Plus ) - Represents 1 or many times of matching.

Operator * represents that the pattern is optional and it can come anytimes.

Operator + represents that the pattern is mandatory or must and it cancome any times.

Example 1:

RegEx: /[0-9]+/

Source Text1: 123 This is a sample text.

Result: It will match '123'.

Example 1:

RegEx: /This is a [0-9]*/

Source Text1: 123 This is a sample text.

Result: It will match 'This is a ' because [0-9] is optional.

13. Short Cut Notations.

Perl provides lot of short cut notations to write regular expressions.These short cut notations help us to understand the regex easily and writesmaller regular expressions.

List of short cut notations.

1. w - Match a 'word' character ( alphanumeric & _ )

2. W - Match a non-word character.

3. s - Match a whitespace character. ( Tab ( t ), NewLine ( n ),Return ( r ) & space )

4. S - Match a non-whitespace character.

5. d - Match a digit character.

6. D - Match a non-digit character.

These short cut notations can be used inside character classes also.To match repeatedly use repeated expressions.

Example:

1. RegEx: /[w]+/

This will match a word.

2. RegEx: /[^w]+/

This will match other than a alphanumeric & _ character.

14. Miscellaneous Information.

^ operator tells the compiler to match the text from the beginning ofa line.

$ operator tells the compiler to match the text from end of the line.

15. Summary

1. Literal matching. /Text/

2. Character Sets. /[a-z]/

3. Range /[0-9]/

4. Any character /./

5. Grouping / ( [0-9]+ )/

6. Back references / ([0-9]+) 1 /

7. Optional Expression /[0-9]?/

8. Counted Expression /([0-9]){1,4}/

9. Alternative Expression /(TEXT|text)/

10. * - Zero or many times.

11. + - One or many times.

16. Quick Reference Guide

Regular Expression

Each character matches itself, unless it is one of the special characters+?.*$()[]{}|. The special meaning of these characters can be escaped usinga ‘’.

. matches an arbitrary character, but not a newline unless it is a single-linematch (see m//s).

(...) groups a series of pattern elements to a single element. matchesthe beginning of the target. In multi-line mode (see m//m)also matchesafter every newline character.

$ matches the end of the line. In multi-line mode also matches beforeevery newline character.

[...] denotes a class of characters to match. [...] negates the class.

(...|...|...) matches one of the alternatives.

(?# TEXT ) Comment.

(?: REGEXP ) Like (REGEXP) but does not make back-references.

(?= REGEXP ) Zero width positive look-ahead assertion.

(?! REGEXP ) Zero width negative look-ahead assertion.

(? MODIFIER ) Embedded pattern-match modifier. MODIFIER can be one ormore of i, m, s or x. Quantified subpatterns match as many times as possible.When followed with a ‘?’ they match the minimum number of times. Theseare the quantifiers:

+ matches the preceding pattern element one or more times.

? matches zero or one times.

* matches zero or more times.

{N,M} denotes the minimum N and maximum M match count. {N} means exactlyN times; {N,} means at least N times.

A ‘’ escapes any special meaning of the following character if non-alphanumeric,but it turns most alphanumeric characters into something special:

w matches alphanumeric, including ‘_’, W matches non-alphanumeric.

s matches whitespace, S matches non-whitespace.

d matches numeric, D matches non-numeric.

A matches the beginning of the string, Z matches the end.

b matches word boundaries, B matches non-boundaries.

G matches where the previous m//g search left off.

n, r, f, t etc. have their usual meaning.

w, s and d may be used within character classes, b denotes backspacein this context.

Back-references:

1...9 refer to matched sub-expressions, grouped with (), inside thematch. 10 and up can also be used if the pattern matches that many sub-expressions.See also $1...$9, $+, $&, $‘ and $’ in section ‘Special variables’.With modifier x, whitespace can be used in the patterns for readabilitypurposes.

Search & Replace

[ EXPR =˜ ][m]/PATTERN/ [ g ][i][m][o][s][x]

Searches EXPR (default: $_) for a pattern. If you prepend an m you canuse almost any pair of delimiters instead of the slashes. If used in arraycontext, an array is returned consisting of the sub-expressions matchedby the parentheses in pattern, i.e. ($1,$2,$3,...).

Optional modifiers: g matches as many times as possible; i searchesin a case-insensitive manner; o interpolates variables only once. m treatsthe string as multiple lines; s treats the string as a single line; x allowsfor regular expression extensions.

If PATTERN is empty, the most recent pattern from a previous match orreplacement is used. With g the match can be used as an iterator in scalarcontext.

?PATTERN?

This is just like the /PATTERN/ search, except that it matches onlyonce between calls to the reset operator.

[ $VAR =˜ ] s/PATTERN/REPLACEMENT/ [ e ][g][i][m][o][s][x]

Searches a string for a pattern, and if found, replaces that patternwith the replacement text. It returns the number of substitutions made,if any, otherwise it returns false.

Optional modifiers: g replaces all occurrences of the pattern; e evaluatesthe replacement string as a Perl expression; for the other modifiers, see/PATTERN/ matching. Almost any delimiter may replace the slashes; if singlequotes are used, no interpolation is done on the strings between the delimiters,otherwise they are interpolated as if inside double quotes. If bracketingdelimiters are used, PATTERN and REPLACEMENT may have their own delimiters,e.g. s(foo)[bar].

If PATTERN is empty, the most recent pattern from a previous match orreplacement is used.

[ $VAR =˜ ] tr/SEARCHLIST/REPLACEMENTLIST/ [ c ][d][s]

Translates all occurrences of the characters found in the search listwith the corresponding character in the replacement list. It returns thenumber of characters replaced. y may be used instead of tr.

Optional modifiers: c complements the SEARCHLIST; d deletes all charactersfound in SEARCHLIST that do not have a corresponding character in REPLACEMENTLIST;s squeezes all sequences of characters that are translated into the sametarget character into one occurrence of this character.

pos SCALAR

Returns the position where the last m//g search left off for SCALAR.Maybe assigned to.

study [ $VARy ]

Studies the scalar variable $VAR in anticipation of performing manypattern matches on its contents before the variable is next modified.

The tables below are a reference to basic regex. While reading the rest of the site, when in doubt, you can always come back and look here. (It you want a bookmark, here's a direct link to the regex reference tables). I encourage you to print the tables so you have a cheat sheet on your desk for quick reference.
The tables are not exhaustive, for two reasons. First, every regex flavor is different, and I didn't want to crowd the page with overly exotic syntax. For a full reference to the particular regex flavors you'll be using, it's always best to go straight to the source. In fact, for some regex engines (such as Perl, PCRE, Java and .NET) you may want to check once a year, as their creators often introduce new features.
The other reason the tables are not exhaustive is that I wanted them to serve as a quick introduction to regex. If you are a complete beginner, you should get a firm grasp of basic regex syntax just by reading the examples in the tables. I tried to introduce features in a logical order and to keep out oddities that I've never seen in actual use, such as the 'bell character'. With these tables as a jumping board, you will be able to advance to mastery by exploring the other pages on the site.

How to use the tables

The tables are meant to serve as an accelerated regex course, and they are meant to be read slowly, one line at a time. On each line, in the leftmost column, you will find a new element of regex syntax. The next column, 'Legend', explains what the element means (or encodes) in the regex syntax. The next two columns work hand in hand: the 'Example' column gives a valid regular expression that uses the element, and the 'Sample Match' column presents a text string that could be matched by the regular expression.
You can read the tables online, of course, but if you suffer from even the mildest case of online-ADD (attention deficit disorder), like most of us… Well then, I highly recommend you print them out. You'll be able to study them slowly, and to use them as a cheat sheet later, when you are reading the rest of the site or experimenting with your own regular expressions.
Enjoy!
If you overdose, make sure not to miss the next page, which comes back down to Earth and talks about some really cool stuff: The 1001 ways to use Regex.

Regex Accelerated Course and Cheat Sheet

For easy navigation, here are some jumping points to various sections of the page:
✽ Characters
✽ Quantifiers
✽ More Characters
✽ Logic
✽ More White-Space
✽ More Quantifiers
✽ Character Classes
✽ Anchors and Boundaries
✽ POSIX Classes
✽ Inline Modifiers
✽ Lookarounds
✽ Character Class Operations
✽ Other Syntax
(direct link)

Characters

Character	Legend	Example	Sample Match
d	Most engines: one digit from 0 to 9	file_dd	file_25
d	.NET, Python 3: one Unicode digit in any script	file_dd	file_9੩
w	Most engines: 'word character': ASCII letter, digit or underscore	w-www	A-b_1
w	.Python 3: 'word character': Unicode letter, ideogram, digit, or underscore	w-www	字-ま_۳
w	.NET: 'word character': Unicode letter, ideogram, digit, or connector	w-www	字-ま‿۳
s	Most engines: 'whitespace character': space, tab, newline, carriage return, vertical tab	asbsc	a b c
s	.NET, Python 3, JavaScript: 'whitespace character': any Unicode separator	asbsc	a b c
D	One character that is not a digit as defined by your engine's d	DDD	ABC
W	One character that is not a word character as defined by your engine's w	WWWWW	*-+=)
S	One character that is not a whitespace character as defined by your engine's s	SSSS	Yoyo

(direct link)

Quantifiers

Quantifier	Legend	Example	Sample Match
+	One or more	Version w-w+	Version A-b1_1
{3}	Exactly three times	D{3}	ABC
{2,4}	Two to four times	d{2,4}	156
{3,}	Three or more times	w{3,}	regex_tutorial
*	Zero or more times	ABC*	AAACC
?	Once or none	plurals?	plural