Sideway
output.to from Sideway
Draft for Information Only

Content

Regular Expression Regular Expression Options
 .NET Regular Expression Options
 Specifying the Options
  Determining the Options
 .NET Regular Expression Option Modes
  Default Options
   Case-Insensitive Matching
   Multiline Mode
   Single-line Mode
   Explicit Captures Only
   Compiled Regular Expressions
   Ignore White Space
   Right-to-Left Mode
   ECMAScript Matching Behavior
   Comparison Using the Invariant Culture
 Examples
 See also
 Source/Referencee

Regular Expression Regular Expression Options

By default, the comparison of an input string with any literal characters in a regular expression pattern is case sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly. You can modify these and several other aspects of default regular expression behavior by specifying regular expression options. These options, which are listed in the following table, can be included inline as part of the regular expression pattern, or they can be supplied to a System.Text.RegularExpressions.Regex class constructor or static pattern matching method as a System.Text.RegularExpressions.RegexOptions enumeration value.

.NET Regular Expression Options

The supported .NET Regular Expression Options are

RegexOptions member Inline character Effect
None Not available Use default behavior. For more information, see Default Options.
IgnoreCase i Use case-insensitive matching. For more information, see Case-Insensitive Matching.
Multiline m Use multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string). For more information, see Multiline Mode.
Singleline s Use single-line mode, where the period (.) matches every character (instead of every character except \n). For more information, see Singleline Mode.
ExplicitCapture n Do not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression). For more information, see Explicit Captures Only.
Compiled Not available Compile the regular expression to an assembly. For more information, see Compiled Regular Expressions.
IgnorePatternWhitespace x Exclude unescaped white space from the pattern, and enable comments after a number sign (#). For more information, see Ignore White Space.
RightToLeft Not available Change the search direction. Search moves from right to left instead of from left to right. For more information, see Right-to-Left Mode.
ECMAScript Not available Enable ECMAScript-compliant behavior for the expression. For more information, see ECMAScript Matching Behavior.
CultureInvariant Not available Ignore cultural differences in language. For more information, see Comparison Using the Invariant Culture.

Specifying the Options

You can specify options for regular expressions in one of three ways:

  • In the options parameter of a System.Text.RegularExpressions.Regex class constructor or static (Shared in Visual Basic) pattern-matching method, such as Regex.Regex(String, RegexOptions) or Regex.Match(String, String, RegexOptions). The options parameter is a bitwise OR combination of System.Text.RegularExpressions.RegexOptions enumerated values.

    When options are supplied to a Regex instance by using the options parameter of a class constructor, the options are assigned to the System.Text.RegularExpressions.RegexOptions property. However, the System.Text.RegularExpressions.RegexOptions property does not reflect inline options in the regular expression pattern itself.

  • By applying inline options in a regular expression pattern with the syntax (?imnsx-imnsx). The option applies to the pattern from the point that the option is defined to either the end of the pattern or to the point at which the option is undefined by another inline option. Note that the System.Text.RegularExpressions.RegexOptions property of a Regex instance does not reflect these inline options. For more information, see the Miscellaneous Constructs topic.
  • By applying inline options in a particular grouping construct in a regular expression pattern with the syntax (?imnsx-imnsx:subexpression). No sign before a set of options turns the set on; a minus sign before a set of options turns the set off. (? is a fixed part of the language construct's syntax that is required whether options are enabled or disabled.) The option applies only to that group. For more information, see Grouping Constructs.

If options are specified inline, a minus sign (-) before an option or set of options turns off those options. All regular expression options are turned off by default.

If the regular expression options specified in the options parameter of a constructor or method call conflict with the options specified inline in a regular expression pattern, the inline options are used.

The following five regular expression options can be set both with the options parameter and inline:

  • RegexOptions.IgnoreCase

  • RegexOptions.Multiline

  • RegexOptions.Singleline

  • RegexOptions.ExplicitCapture

  • RegexOptions.IgnorePatternWhitespace

The following five regular expression options can be set using the options parameter but cannot be set inline:

  • RegexOptions.None

  • RegexOptions.Compiled

  • RegexOptions.RightToLeft

  • RegexOptions.CultureInvariant

  • RegexOptions.ECMAScript

Determining the Options

You can determine which options were provided to a Regex object when it was instantiated by retrieving the value of the read-only Regex.Options property. This property is particularly useful for determining the options that are defined for a compiled regular expression created by the Regex.CompileToAssembly method.

To test for the presence of any option except RegexOptions.None, perform an AND operation with the value of the Regex.Options property and the RegexOptions value in which you are interested. Then test whether the result equals that RegexOptions value.

.NET Regular Expression Option Modes

Default Options

The RegexOptions.None option indicates that no options have been specified, and the regular expression engine uses its default behavior. This includes the following:

  • The pattern is interpreted as a canonical rather than an ECMAScript regular expression.

  • The regular expression pattern is matched in the input string from left to right.

  • Comparisons are case-sensitive.

  • The ^ and $ language elements match the beginning and end of the input string.

  • The . language element matches every character except \n.

  • Any white space in a regular expression pattern is interpreted as a literal space character.

  • The conventions of the current culture are used when comparing the pattern to the input string.

  • Capturing groups in the regular expression pattern are implicit as well as explicit.

The RegexOptions.None option has no inline equivalent. When regular expression options are applied inline, the default behavior is restored on an option-by-option basis, by turning a particular option off. For example, (?i) turns on case-insensitive comparison, and (?-i) restores the default case-sensitive comparison.

Because the RegexOptions.None option represents the default behavior of the regular expression engine, it is rarely explicitly specified in a method call. A constructor or static pattern-matching method without an options parameter is called instead.

Case-Insensitive Matching

The IgnoreCase option, or the i inline option, provides case-insensitive matching. By default, the casing conventions of the current culture are used.

Multiline Mode

The RegexOptions.Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

By default, $ matches only the end of the input string. If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

Single-line Mode

The RegexOptions.Singleline option, or the s inline option, causes the regular expression engine to treat the input string as if it consists of a single line. It does this by changing the behavior of the period (.) language element so that it matches every character, instead of matching every character except for the newline character \n or \u000A.

Explicit Captures Only

By default, capturing groups are defined by the use of parentheses in the regular expression pattern. Named groups are assigned a name or number by the (?<name>subexpression) language option, whereas unnamed groups are accessible by index. In the GroupCollection object, unnamed groups precede named groups.

Grouping constructs are often used only to apply quantifiers to multiple language elements, and the captured substrings are of no interest.

Capturing groups that are not subsequently used can be expensive, because the regular expression engine must populate both the GroupCollection and CaptureCollection collection objects. As an alternative, you can use either the RegexOptions.ExplicitCapture option or the n inline option to specify that the only valid captures are explicitly named or numbered groups that are designated by the (?<name> subexpression) construct.

Compiled Regular Expressions

By default, regular expressions in .NET are interpreted. When a Regex object is instantiated or a static Regex method is called, the regular expression pattern is parsed into a set of custom opcodes, and an interpreter uses these opcodes to run the regular expression. This involves a tradeoff: The cost of initializing the regular expression engine is minimized at the expense of run-time performance.

You can use compiled instead of interpreted regular expressions by using the RegexOptions.Compiled option. In this case, when a pattern is passed to the regular expression engine, it is parsed into a set of opcodes and then converted to Microsoft intermediate language (MSIL), which can be passed directly to the common language runtime. Compiled regular expressions maximize run-time performance at the expense of initialization time.

A regular expression can be compiled only by supplying the RegexOptions.Compiled value to the options parameter of a Regex class constructor or a static pattern-matching method. It is not available as an inline option.

You can use compiled regular expressions in calls to both static and instance regular expressions. In static regular expressions, the RegexOptions.Compiled option is passed to the options parameter of the regular expression pattern-matching method. In instance regular expressions, it is passed to the options parameter of the Regex class constructor. In both cases, it results in enhanced performance.

However, this improvement in performance occurs only under the following conditions:

  • A Regex object that represents a particular regular expression is used in multiple calls to regular expression pattern-matching methods.

  • The Regex object is not allowed to go out of scope, so it can be reused.

  • A static regular expression is used in multiple calls to regular expression pattern-matching methods. (The performance improvement is possible because regular expressions used in static method calls are cached by the regular expression engine.)

The RegexOptions.Compiled option is unrelated to the Regex.CompileToAssembly method, which creates a special-purpose assembly that contains predefined compiled regular expressions.

Ignore White Space

By default, white space in a regular expression pattern is significant; it forces the regular expression engine to match a white-space character in the input string. Because of this, the regular expression "\b\w+\s" and "\b\w+ " are roughly equivalent regular expressions. In addition, when the number sign (#) is encountered in a regular expression pattern, it is interpreted as a literal character to be matched.

The RegexOptions.IgnorePatternWhitespace option, or the x inline option, changes this default behavior as follows:

  • Unescaped white space in the regular expression pattern is ignored. To be part of a regular expression pattern, white-space characters must be escaped (for example, as \s or "\ ").

  • The number sign (#) is interpreted as the beginning of a comment, rather than as a literal character. All text in the regular expression pattern from the # character to the end of the string is interpreted as a comment.

However, in the following cases, white-space characters in a regular expression aren't ignored, even if you use the RegexOptions.IgnorePatternWhitespace option:

  • White space within a character class is always interpreted literally. For example, the regular expression pattern [ .,;:] matches any single white-space character, period, comma, semicolon, or colon.

  • White space isn't allowed within a bracketed quantifier, such as {n}, {n,}, and {n,m}. For example, the regular expression pattern \d{1, 3} fails to match any sequences of digits from one to three digits because it contains a white-space character.

  • White space isn't allowed within a character sequence that introduces a language element. For example:

    • The language element (?:subexpression) represents a noncapturing group, and the (?: portion of the element can't have embedded spaces. The pattern (? :subexpression) throws an ArgumentException at run time because the regular expression engine can't parse the pattern, and the pattern ( ?:subexpression) fails to match subexpression.

    • The language element \p{name}, which represents a Unicode category or named block, can't include embedded spaces in the \p{ portion of the element. If you do include a white space, the element throws an ArgumentException at run time.

Enabling this option helps simplify regular expressions that are often difficult to parse and to understand. It improves readability, and makes it possible to document a regular expression.

Right-to-Left Mode

By default, the regular expression engine searches from left to right. You can reverse the search direction by using the RegexOptions.RightToLeft option. The search automatically begins at the last character position of the string. For pattern-matching methods that include a starting position parameter, such as Regex.Match(String, Int32), the starting position is the index of the rightmost character position at which the search is to begin.

Right-to-left pattern mode is available only by supplying the RegexOptions.RightToLeft value to the options parameter of a Regex class constructor or static pattern-matching method. It is not available as an inline option.

The RegexOptions.RightToLeft option changes the search direction only; it does not interpret the regular expression pattern from right to left.

ECMAScript Matching Behavior

By default, the regular expression engine uses canonical behavior when matching a regular expression pattern to input text. However, you can instruct the regular expression engine to use ECMAScript matching behavior by specifying the RegexOptions.ECMAScript option.

ECMAScript-compliant behavior is available only by supplying the RegexOptions.ECMAScript value to the options parameter of a Regex class constructor or static pattern-matching method. It is not available as an inline option.

The RegexOptions.ECMAScript option can be combined only with the RegexOptions.IgnoreCase and RegexOptions.Multiline options. The use of any other option in a regular expression results in an ArgumentOutOfRangeException.

The behavior of ECMAScript and canonical regular expressions differs in three areas: character class syntax, self-referencing capturing groups, and octal versus backreference interpretation.

  • Character class syntax. Because canonical regular expressions support Unicode whereas ECMAScript does not, character classes in ECMAScript have a more limited syntax, and some character class language elements have a different meaning. For example, ECMAScript does not support language elements such as the Unicode category or block elements \p and \P. Similarly, the \w element, which matches a word character, is equivalent to the [a-zA-Z_0-9] character class when using ECMAScript and [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}] when using canonical behavior. For more information, see Character Classes.

  • Self-referencing capturing groups. A regular expression capture class with a backreference to itself must be updated with each capture iteration.
  • Resolution of ambiguities between octal escapes and backreferences. The following table summarizes the differences in octal versus backreference interpretation by canonical and ECMAScript regular expressions.

    Regular expression Canonical behavior ECMAScript behavior
    \0 followed by 0 to 2 octal digits Interpret as an octal. For example, \044 is always interpreted as an octal value and means "$". Same behavior.
    \ followed by a digit from 1 to 9, followed by no additional decimal digits, Interpret as a backreference. For example, \9 always means backreference 9, even if a ninth capturing group does not exist. If the capturing group does not exist, the regular expression parser throws an ArgumentException. If a single decimal digit capturing group exists, backreference to that digit. Otherwise, interpret the value as a literal.
    \ followed by a digit from 1 to 9, followed by additional decimal digits Interpret the digits as a decimal value. If that capturing group exists, interpret the expression as a backreference.

    Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. Interpret the remaining digits as literals. For example, in the expression \3000, if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0.
    Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals.

Comparison Using the Invariant Culture

By default, when the regular expression engine performs case-insensitive comparisons, it uses the casing conventions of the current culture to determine equivalent uppercase and lowercase characters.

However, this behavior is undesirable for some types of comparisons, particularly when comparing user input to the names of system resources, such as passwords, files, or URLs. The following example illustrates such as scenario. The code is intended to block access to any resource whose URL is prefaced with FILE://. The regular expression attempts a case-insensitive match with the string by using the regular expression $FILE://. However, when the current system culture is tr-TR (Turkish-Turkey), "I" is not the uppercase equivalent of "i". As a result, the call to the Regex.IsMatch method returns false, and access to the file is allowed.

For more information about string comparisons that are case-sensitive and that use the invariant culture, see Best Practices for Using Strings.

Instead of using the case-insensitive comparisons of the current culture, you can specify the RegexOptions.CultureInvariant option to ignore cultural differences in language and to use the conventions of the invariant culture.

Comparison using the invariant culture is available only by supplying the RegexOptions.CultureInvariant value to the options parameter of a Regex class constructor or static pattern-matching method. It is not available as an inline option.

 

Examples

Examples of Regular Expression Options
ASP.NET Code Input:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "zabcd fg ijaab"
               Dim xmatchstr As String = ""
               xmatchstr = xmatchstr & "Given string: """ & xstring & """<br />"
           End Sub
           Function showresult(xstring,xpattern,xreplace)
               Dim xmatch As Match
               Dim xcaptures As CaptureCollection
               Dim ycaptures As CaptureCollection
               Dim xgroups As GroupCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               Dim yint As Integer
               Dim zint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Replace(string,""" & Replace(xpattern,"<","&lt;") & """,""" & xreplace & """): """
               xmatchstr = xmatchstr & Regex.Replace(xstring,xpattern,xreplace) & """<br />"
               xmatch = Regex.Match(xstring,xpattern)
               xcaptures = xmatch.Captures
               For xint = 0 to xcaptures.Count - 1
                   xgroups = xmatch.Groups
                   xmatchstr = xmatchstr & "->Result of GroupCollection.Count: """
                   xmatchstr = xmatchstr & xgroups.Count & """<br />"
                   For yint = 0 to xgroups.Count - 1
                       xmatchstr = xmatchstr & "->->Result of GroupCollection("& yint & ").Value, Index, Length: """
                       xmatchstr = xmatchstr & xgroups(yint).Value & ", " & xgroups(yint).Index & ", " & xgroups(yint).Length & """<br />"
                       ycaptures = xgroups(yint).Captures
                       xmatchstr = xmatchstr & "->->->Result of CaptureCollection.Count: """
                       xmatchstr = xmatchstr & ycaptures.Count & """<br />"
                       For zint = 0 to ycaptures.Count - 1
                           xmatchstr = xmatchstr & "->->->->Result of CaptureCollection("& zint & ").Value, Index, Length: """
                           xmatchstr = xmatchstr & ycaptures(zint).Value & ", " & ycaptures(zint).Index & ", " & ycaptures(zint).Length & """<br />"
                       Next
                   Next
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Regular Expression Options</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>
HTML Web Page Embedded Output:

See also

  • Regular Expression Language - Quick Reference

Source/Referencee

  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options

©sideway

ID: 190800002 Last Updated: 8/2/2019 Revision: 0 Ref:

close

References

  1. Active Server Pages,  , http://msdn.microsoft.com/en-us/library/aa286483.aspx
  2. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929%28v=vs.90%29.aspx
  3. ASP Best Practices,  , http://technet.microsoft.com/en-us/library/cc939157.aspx
  4. ASP Built-in Objects,  , http://msdn.microsoft.com/en-us/library/ie/ms524716(v=vs.90).aspx
  5. Response Object,  , http://msdn.microsoft.com/en-us/library/ms525405(v=vs.90).aspx
  6. Request Object,  , http://msdn.microsoft.com/en-us/library/ms524948(v=vs.90).aspx
  7. Server Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525541(v=vs.90).aspx
  8. Application Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525360(v=vs.90).aspx
  9. Session Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms524319(8v=vs.90).aspx
  10. ASPError Object,  , http://msdn.microsoft.com/en-us/library/ms524942(v=vs.90).aspx
  11. ObjectContext Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525667(v=vs.90).aspx
  12. Debugging Global.asa Files,  , http://msdn.microsoft.com/en-us/library/aa291249(v=vs.71).aspx
  13. How to: Debug Global.asa files,  , http://msdn.microsoft.com/en-us/library/ms241868(v=vs.80).aspx
  14. Calling COM Components from ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524620(v=VS.90).aspx
  15. IIS ASP Scripting Reference,  , http://msdn.microsoft.com/en-us/library/ms524664(v=vs.90).aspx
  16. ASP Keywords,  , http://msdn.microsoft.com/en-us/library/ms524672(v=vs.90).aspx
  17. Creating Simple ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524741(v=vs.90).aspx
  18. Including Files in ASP Applications,  , http://msdn.microsoft.com/en-us/library/ms524876(v=vs.90).aspx
  19. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929(v=vs.90).aspx
  20. FileSystemObject Object,  , http://msdn.microsoft.com/en-us/library/z9ty6h50(v=vs.84).aspx
  21. http://msdn.microsoft.com/en-us/library/windows/desktop/ms675944(v=vs.85).aspx,  , ADO Object Model
  22. ADO Fundamentals,  , http://msdn.microsoft.com/en-us/library/windows/desktop/ms680928(v=vs.85).aspx
close

Latest Updated LinksValid XHTML 1.0 Transitional Valid CSS!Nu Html Checker Firefox53 Chromena IExplorerna
IMAGE

Home 5

Business

Management

HBR 3

Information

Recreation

Hobbies 8

Culture

Chinese 1097

English 339

Reference 79

Computer

Hardware 249

Software

Application 213

Digitization 32

Latex 52

Manim 205

KB 1

Numeric 19

Programming

Web 289

Unicode 504

HTML 66

CSS 65

SVG 46

ASP.NET 270

OS 429

DeskTop 7

Python 72

Knowledge

Mathematics

Formulas 8

Algebra 84

Number Theory 206

Trigonometry 31

Geometry 34

Coordinate Geometry 2

Calculus 67

Complex Analysis 21

Engineering

Tables 8

Mechanical

Mechanics 1

Rigid Bodies

Statics 92

Dynamics 37

Fluid 5

Fluid Kinematics 5

Control

Process Control 1

Acoustics 19

FiniteElement 2

Natural Sciences

Matter 1

Electric 27

Biology 1

Geography 1


Copyright © 2000-2024 Sideway . All rights reserved Disclaimers last modified on 06 September 2019