Thursday, July 25, 2013

Regular Expressions in C# (RegEx)

While working with several code-cleanup regular expressions, I had a thought of putting together a collection of nifty and useful patterns that I could use latter when the time came so I and anyone else wouldn't need to rebuild or re-research them.

The list is small for now, but I will try to add to it in time... If you have useful expression not listed here, do share!



  • {([^{]*?)}
    • This expression matches the objects contents from left curly bracket to the right and returns the body within the brackets $1.
    • If implemented properly in a source code bottom-up decomposer, this expression will be able to help break a string into a tree of objects (TreeNode)
  • \{((?>[^{}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!)))\} by Tim Pietzcker on StackOverflow
    • This does the same as the above except that it also includes embedded objects.
    • If implemented properly in a source code top-down decomposer, this expression will be able to help break a string into a tree of objects (TreeNode)

  • ^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$
    • This expression matches valid emails.

  • (?<=<body([^<]*?)>)[\s\S]*(?=</body>)
    • Extract HTML body using Look-ahead and Look-behind.
  • (<script([^<]*?)>[\s\S]*?</script>)
    • Select SCRIPT block within body of HTML.

  • [\n]+
    • Remove multiple newline characters: '\n\n\n' => '\n' (replace '\n' with any other character to do the same: [ab]+ => 'aaabbb' => 'ab')

  • "[^\\"]*(?:(?:\\\\)*(?:\\"[^\\"]*)?)*" Hatchi on StackOverflow
    • Capture a string with an embedded string - used for parsing source code. (This one took me some time to find!)
  • \"([^\"]+)\"
    • Capture every other matching string.
      Example: "my"string"has"quotes"!" → { my, has, ! } and not { string, quotes }.

  • (?<=^)
    • Match patterns at the beginning of a string using the 'look-behind' expression combined with the carrot character delimiter.

    I wrote a simple WinForms App to test my expressions on so I would know if they would work the way I intended them to. This editor was also built to apply the expression on the given text, a file, or a tree of files of a given file extension. There is the option to Replace, Capitalize, Lowercase, and Invert-Case.
    The code in this picture is partially parsed java code. Though I am capable of manually parsing Java code to CSharp, it becomes a tedious chore that consomes too much time when there are over 1.2 thousand files to port (and a simple String.Replace will not do the trick - too many variables to consider)!

No comments:

Post a Comment