Difference between revisions of "Programming:Regular Expressions"

From WhyAskWhy.org Wiki
Jump to: navigation, search
m (Saving progress)
m (Added grep IP Address regex)
 
(8 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
== Notepad++ ==
 
== Notepad++ ==
  
''I'm thinking of breaking the entries from the main table into separate sections to emphasize certain points.''
+
I've listed these for my use with Notepad++, but some may work well for other purposes also.
  
 
=== Match scgi directives in nginx source code ===
 
=== Match scgi directives in nginx source code ===
Line 22: Line 22:
 
|-
 
|-
 
|<code>.*(scgi.*)".*</code>
 
|<code>.*(scgi.*)".*</code>
|\1
+
|<code>\1</code>
 +
|}
 +
 
 +
 
 +
=== Indent each line 12 spaces, enclose in single quotes and add trailing comma ===
 +
 
 +
This snippet can be used to apply appropriate spacing and quoting to lines of text that will be used in a PHP array; specifically directives in a GeSHi language file.
 +
 
 +
{| class="wikitable"
 +
|-
 +
!Find what:
 +
!Replace with:
 +
|-
 +
|<code>^(.*)</code>
 +
|<syntaxhighlight lang="text" enclose="none">            '\1',</syntaxhighlight>
 +
|}
 +
 
 +
 
 +
=== Append a pipe character in front of every line ===
 +
 
 +
This snippet can be used to apply a pipe character in front of every line. You might do this if you are going to cut/paste the content into a MediaWiki table for example.
 +
 
 +
{| class="wikitable"
 +
|-
 +
!Find what:
 +
!Replace with:
 +
|-
 +
|<code>^(\w)</code>
 +
|<code><nowiki>|\1</nowiki></code>
 +
|}
 +
 
 +
 
 +
=== Match characters not used in nginx directive names ===
 +
 
 +
For example:
 +
 
 +
<syntaxhighlight lang="text">
 +
2.21 keepalive_disable
 +
</syntaxhighlight>
 +
 
 +
In this case, we want to match <code>2.21</code> and any spaces before/after the <code>keepalive_disable</code> directive. We could search for that and replace with nothing in order to cleanup the text.
 +
 
 +
{| class="wikitable"
 +
|-
 +
!Find what:
 +
!Replace with:
 +
|-
 +
|<code>[^a-zA-Z_-]</code>
 +
|
 +
|}
 +
 
 +
 
 +
=== Remove all text other than a MAC Address ===
 +
 
 +
''Still needs some work''
 +
 
 +
Match MAC Addresses with either <code>-</code> or <code>:</code> as separators using non-capturing parenthesis.
 +
 
 +
{| class="wikitable"
 +
|-
 +
!Find what:
 +
!Replace with:
 +
|-
 +
|<code><nowiki>^.*(\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}).*</nowiki></code>
 +
|<code><nowiki>\1</nowiki></code>
 
|}
 
|}
 +
 +
 +
=== MAC Addresses - Convert dashes to colons ===
 +
 +
Match MAC Addresses with <code>-</code> and replace with <code>:</code> as separators using capturing parenthesis.
 +
 +
{| class="wikitable"
 +
|-
 +
!Find what:
 +
!Replace with:
 +
|-
 +
|<code><nowiki>(\w{2})-(\w{2})-(\w{2})-(\w{2})-(\w{2})-(\w{2})</nowiki></code>
 +
|<code><nowiki>\1:\2:\3:\4:\5:\6</nowiki></code>
 +
|}
 +
  
 
=== Other ===
 
=== Other ===
 
I've listed these for my use with Notepad++, but some may work well for other purposes also.
 
  
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
|+Notepad++ Regular Expressions
 
 
|-
 
|-
 
!Regular Expression
 
!Regular Expression
Line 39: Line 115:
 
|Match on C++ (or C99+) comments and all text after
 
|Match on C++ (or C99+) comments and all text after
 
|Selecting comments for removal
 
|Selecting comments for removal
|-
 
|<code>^(\w)</code>
 
|Match the first word character
 
|Notepad++ Search box
 
|-
 
|<code><nowiki>|\1</nowiki></code>
 
|Append a "pipe" in front of the first match found in an earlier search
 
|Notepad++ Replace box
 
 
|-
 
|-
 
|<code>^\s+$</code>
 
|<code>^\s+$</code>
Line 60: Line 128:
 
|Notepad++ Search box; forced text alignment
 
|Notepad++ Search box; forced text alignment
 
|-
 
|-
|<code>[^a-zA-Z_-]</code>
+
|<code><nowiki>(\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2})</nowiki></code>
|Match text prior to nginx directives (ex: <code>2.21 keepalive_disable</code>)
+
|Match MAC Addresses with either <code>-</code> or <code>:</code> as separators
|Notepad++ Search box; search & replace for code cleanup.
+
|Notepad++ Search box?
 +
|-
 +
|<code><nowiki>(\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2})</nowiki></code>
 +
|Same as above, but using non-capturing parenthesis instead
 +
|Notepad++ Search box?
 
|}
 
|}
 +
 +
== GNU grep ==
 +
 +
=== IP Addresses ===
 +
 +
<code> grep "\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b" /path/to/file</code>
 +
 +
 +
 +
== Additional information ==
 +
 +
* [http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/ AddedBytes.com - Regular Expression Cheat Sheet]
 +
* [http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad removing empty lines in notepad++]
 +
* [http://stackoverflow.com/questions/283608/using-regex-to-prefix-and-append-in-notepad Using RegEX To Prefix And Append In Notepad++]
 +
* [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions How to use regular expressions in Notepad++ - Tutorial]
 +
 +
 +
=== Online Tools ===
 +
 +
* [http://rubular.com/ Rubular, a Ruby regular expression editor]
 +
* [http://gskinner.com/RegExr/ RegExr]
 +
* [http://regexpal.com/ RegexPal]
 +
* [http://www.regextester.com/ REGex TESTER]
 +
 +
 +
=== Books ===
 +
 +
* [[oreilly:0636920012337|Introducing Regular Expressions]] (highly recommended for those new to Regular Expressions)
 +
* [[apress:9781590594414|Regular Expression Recipes - Ch 1 (free sample)]]

Latest revision as of 18:13, 22 August 2014


Historically, I've avoided Regular Expressions as I've found them cryptic and very difficult to work with. Over the past few years I've had more need for them and have been using them often enough that I'm getting used to the idea of using them.

Lots of power, but still just as cryptic as ever. This page will list various tips/tricks/recipes that I've found useful.


Notepad++

I've listed these for my use with Notepad++, but some may work well for other purposes also.

Match scgi directives in nginx source code

This snippet can be used to match scgi directives in nginx source code in order to get a listing of valid directives for the scgi module.

Find what: Replace with:
.*(scgi.*)".* \1


Indent each line 12 spaces, enclose in single quotes and add trailing comma

This snippet can be used to apply appropriate spacing and quoting to lines of text that will be used in a PHP array; specifically directives in a GeSHi language file.

Find what: Replace with:
^(.*) '\1',


Append a pipe character in front of every line

This snippet can be used to apply a pipe character in front of every line. You might do this if you are going to cut/paste the content into a MediaWiki table for example.

Find what: Replace with:
^(\w) |\1


Match characters not used in nginx directive names

For example:

2.21 keepalive_disable

In this case, we want to match 2.21 and any spaces before/after the keepalive_disable directive. We could search for that and replace with nothing in order to cleanup the text.

Find what: Replace with:
[^a-zA-Z_-]


Remove all text other than a MAC Address

Still needs some work

Match MAC Addresses with either - or : as separators using non-capturing parenthesis.

Find what: Replace with:
^.*(\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}).* \1


MAC Addresses - Convert dashes to colons

Match MAC Addresses with - and replace with : as separators using capturing parenthesis.

Find what: Replace with:
(\w{2})-(\w{2})-(\w{2})-(\w{2})-(\w{2})-(\w{2}) \1:\2:\3:\4:\5:\6


Other

Regular Expression Purpose Useful for
//\s*.*$ Match on C++ (or C99+) comments and all text after Selecting comments for removal
^\s+$ Match lines that start with whitespace and end with whitespace Notepad++ Search box; search & replace for code cleanup.
\s+$ Match trailing whitespace Notepad++ Search box; search & replace for code cleanup.
^\s+ Match leading whitespace Notepad++ Search box; forced text alignment
(\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2}(-|:)\w{2}) Match MAC Addresses with either - or : as separators Notepad++ Search box?
(\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}(?:-|:)\w{2}) Same as above, but using non-capturing parenthesis instead Notepad++ Search box?

GNU grep

IP Addresses

grep "\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b" /path/to/file


Additional information


Online Tools


Books