Sideway
output.to from Sideway
Draft for Information Only

Content

Regular Expression Character Classes
 .NET Any Character, .
 Escaped Character Class \
  .NET Unicode category or Unicode block: \p{name}
  .NET Negative Unicode category or Unicode block: \P{name}
  .NET Word Character: \w
  .NET Non-word Character: \W
  .NET Whitespace Character: \s
  .NET Non-whitespace Character: \S
  .NET Decimal Digit Character: \d
  .NET Non-decimal Digit Character: \D
 Character Group, []
  .NET Positive Character Group, []
  .NET Negative Character Group, [^]
  .NET Character Subtraction Group, [base_group-[excluded_group]]
 Supported Unicode
 See also
 Examples
 Source/Reference

Regular Expression Character Classes

Instead of using a single character or escaped character, a character class is a set of characters that is used to match against an input string.

  • a symbol is used to represent a set of characters. e.g. "."=any character.
  • escaped character classes to represent a group of specific charcters. e.g. \p{unicode name,\P{unicode name}, \w, \W, \s, \S, \d, \D
  • a pair square brackets, [], is used to specify a set of characters, e.g. [abc]=a or b or c
    • a hyphen character, -, is used as a range separator unless it is the first or last character of the group. e.g. [a-c]=a or b or c
    • a leading caret character, ^, is used to specify a negative sense that the set of characters must not appear in an input string. e.g. [^abc]=not (a and b and c)
    • a hyphen character, -, is used to indicate a nested excluded group from the base group. e.g. [a-c-[b]]=a or b.

.NET Any Character, .

The period character, ., is used to match any character including the carriage return character, \r or \u000D except the newline character, \n or \u000A. But in a character class, a period, ., is treated as a literal period character.

Character Class Description Exception
. Wildcard: Matches any single character except \n.

To match a literal period character (. or \u002E), you must precede it with the escape character (\.).
In a character class, a period, ., is treated as a literal period character.

Escaped Character Class \

The backslash character, \, used in regular expression for character class can be used to indicate the following character classes.

.NET Unicode category or Unicode block: \p{name}

The backslash character, \, followed by the character p is used to indicate a Unicode general category or named block by specifing the name with the category abbreviation or named block name that any one of which may be used to match an input string..

.NET Negative Unicode category or Unicode block: \P{name}

The backslash character, \, followed by the character P is used to indicate a Unicode general category or named block by specifing the name with the category abbreviation or named block name that cannot appear in an input string.

.NET Word Character: \w

The backslash character, \, followed by the character w is used to indicate a set of word characters that any one of which may be used to match an input string.  By default, the set of word characters are members of the predefined Unicode categories. In other words, \w is equivalent to [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified, \w is equivalent to [a-zA-Z_0-9].

.NET Non-word Character: \W

The backslash character, \, followed by the character W is used to indicate a set of word characters that cannot appear in an input string. By default, the set of word characters are members of the predefined Unicode categories. In other words, \W is equivalent to [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified, \W is equivalent to [^a-zA-Z_0-9].

.NET Whitespace Character: \s

The backslash character, \, followed by the character s is used to indicate a set of whitespace characters that any one of which may be used to match an input string.  By default, the set of whitespace characters are members of the predefined escape sequences and Unicode categories. In other words, \s is equivalent to [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified, \s is equivalent to [ \f\r\r\t\v].

.NET Non-whitespace Character: \S

The backslash character, \, followed by the character S is used to indicate a set of whitespace characters that cannot appear in an input string. By default, the set of whitespace characters are members of the predefined escape sequences and Unicode categories. In other words, \S is equivalent to [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified, \S is equivalent to [^ \f\r\r\t\v].

.NET Decimal Digit Character: \d

The backslash character, \, followed by the character d is used to indicate a set of decimal digit characters that any one of which may be used to match an input string. By default, the set of decimal digit characters are members of the predefined Unicode categories. In other words, \d is equivalent to [\p{Nd}]. If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9].

.NET Non-decimal Digit Character: \D

The backslash character, \, followed by the character D is used to indicate a set of decimal digit characters that cannot appear in an input string. By default, the set of decimal digit characters are members of the predefined Unicode categories. In other words, \D is equivalent to [\P{Nd}]. If ECMAScript-compliant behavior is specified, \D is equivalent to [^0-9].

Character Group, []

.NET Positive Character Group, []

A pair of square brackets is used to specify a set of characters that any one of which may be used to match an input string. The set of characters may be specified individually, as a range, or both.

  • [*character_group*]: character_group can consist of any combination of one or more literal characters, escape characters, or character classes.
  • [*firstCharacter*-*lastCharacter*]: firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. Two characters are contiguous if they have adjacent Unicode code points. firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point. A hyphen character, -, is always interpreted as the range separator unless it is the first or last character of the group.

.NET Negative Character Group, [^]

A pair of square brackets with leading caret is used to specify a set of characters that cannot appear in an input string. The set of characters may be specified individually, as a range, or both.

  • [*^character_group*]: the leading caret, ^, is used to indicate a negative charactergroup. character_group can consist of any combination of one or more literal characters, escape characters, or character classes that cannot appear in an input string..
  • [^*firstCharacter*-*lastCharacter*]: the leading caret, ^, is used to indicate a negative charactergroup. firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. Two characters are contiguous if they have adjacent Unicode code points. firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point. A hyphen character, -, is always interpreted as the range separator unless it is the first or last character of the group.

A negative character group in a larger regular expression pattern is not a zero-width assertion. That is, after evaluating the negative character group, the regular expression engine advances one character in the input string.

.NET Character Subtraction Group, [base_group-[excluded_group]]

A character subtraction group is used to specify a set of characters through subtraction that any one of which may be used to match an input string. The set of character subtraction group is the result of excluding the characters in a base character group from another character excluded group.

The form of character subtraction group is [base_group-[excluded_group]]. The hyphen, -, is used to indicate the following nested group is an character excluded_group.

Supported Unicode

Supported Unicode General Categories  Supported Named Blocks

\p {name}   \w \s \d
Category Description      
Lu Letter, Uppercase    
Ll Letter, Lowercase    
Lt Letter, Titlecase    
Lm Letter, Modifier    
Lo Letter, Other    
L All letter characters. This includes the Lu, Ll, Lt, Lm, and Lo characters.      
Mn Mark, Nonspacing    
Mc Mark, Spacing Combining      
Me Mark, Enclosing      
M All diacritic marks. This includes the Mn, Mc, and Me categories.      
Nd Number, Decimal Digit    
Nl Number, Letter      
No Number, Other      
N All numbers. This includes the Nd, Nl, and No categories.      
Pc Punctuation, Connector    
Pd Punctuation, Dash      
Ps Punctuation, Open      
Pe Punctuation, Close      
Pi Punctuation, Initial quote (may behave like Ps or Pe depending on usage)      
Pf Punctuation, Final quote (may behave like Ps or Pe depending on usage)      
Po Punctuation, Other      
P All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.      
Sm Symbol, Math      
Sc Symbol, Currency      
Sk Symbol, Modifier      
So Symbol, Other      
S All symbols. This includes the Sm, Sc, Sk, and So categories.      
Zs Separator, Space      
Zl Separator, Line      
Zp Separator, Paragraph      
Z All separator characters. This includes the Zs, Zl, and Zp categories.    
Cc Other, Control      
Cf Other, Format      
Cs Other, Surrogate      
Co Other, Private Use      
Cn Other, Not Assigned (no characters have this property)      
C All control characters. This includes the Cc, Cf, Cs, Co, and Cn categories.      
         
Block name Code point range      
IsBasicLatin 0000 - 007F      
IsLatin-1Supplement 0080 - 00FF      
IsLatinExtended-A 0100 - 017F      
IsLatinExtended-B 0180 - 024F      
IsIPAExtensions 0250 - 02AF      
IsSpacingModifierLetters 02B0 - 02FF      
IsCombiningDiacriticalMarks 0300 - 036F      
IsGreek or IsGreekandCoptic 0370 - 03FF      
IsCyrillic 0400 - 04FF      
IsCyrillicSupplement 0500 - 052F      
IsArmenian 0530 - 058F      
IsHebrew 0590 - 05FF      
IsArabic 0600 - 06FF      
IsSyriac 0700 - 074F      
IsThaana 0780 - 07BF      
IsDevanagari 0900 - 097F      
IsBengali 0980 - 09FF      
IsGurmukhi 0A00 - 0A7F      
IsGujarati 0A80 - 0AFF      
IsOriya 0B00 - 0B7F      
IsTamil 0B80 - 0BFF      
IsTelugu 0C00 - 0C7F      
IsKannada 0C80 - 0CFF      
IsMalayalam 0D00 - 0D7F      
IsSinhala 0D80 - 0DFF      
IsThai 0E00 - 0E7F      
IsLao 0E80 - 0EFF      
IsTibetan 0F00 - 0FFF      
IsMyanmar 1000 - 109F      
IsGeorgian 10A0 - 10FF      
IsHangulJamo 1100 - 11FF      
IsEthiopic 1200 - 137F      
IsCherokee 13A0 - 13FF      
IsUnifiedCanadianAboriginalSyllabics 1400 - 167F      
IsOgham 1680 - 169F      
IsRunic 16A0 - 16FF      
IsTagalog 1700 - 171F      
IsHanunoo 1720 - 173F      
IsBuhid 1740 - 175F      
IsTagbanwa 1760 - 177F      
IsKhmer 1780 - 17FF      
IsMongolian 1800 - 18AF      
IsLimbu 1900 - 194F      
IsTaiLe 1950 - 197F      
IsKhmerSymbols 19E0 - 19FF      
IsPhoneticExtensions 1D00 - 1D7F      
IsLatinExtendedAdditional 1E00 - 1EFF      
IsGreekExtended 1F00 - 1FFF      
IsGeneralPunctuation 2000 - 206F      
IsSuperscriptsandSubscripts 2070 - 209F      
IsCurrencySymbols 20A0 - 20CF      
IsCombiningDiacriticalMarksforSymbols or IsCombiningMarksforSymbols 20D0 - 20FF      
IsLetterlikeSymbols 2100 - 214F      
IsNumberForms 2150 - 218F      
IsArrows 2190 - 21FF      
IsMathematicalOperators 2200 - 22FF      
IsMiscellaneousTechnical 2300 - 23FF      
IsControlPictures 2400 - 243F      
IsOpticalCharacterRecognition 2440 - 245F      
IsEnclosedAlphanumerics 2460 - 24FF      
IsBoxDrawing 2500 - 257F      
IsBlockElements 2580 - 259F      
IsGeometricShapes 25A0 - 25FF      
IsMiscellaneousSymbols 2600 - 26FF      
IsDingbats 2700 - 27BF      
IsMiscellaneousMathematicalSymbols-A 27C0 - 27EF      
IsSupplementalArrows-A 27F0 - 27FF      
IsBraillePatterns 2800 - 28FF      
IsSupplementalArrows-B 2900 - 297F      
IsMiscellaneousMathematicalSymbols-B 2980 - 29FF      
IsSupplementalMathematicalOperators 2A00 - 2AFF      
IsMiscellaneousSymbolsandArrows 2B00 - 2BFF      
IsCJKRadicalsSupplement 2E80 - 2EFF      
IsKangxiRadicals 2F00 - 2FDF      
IsIdeographicDescriptionCharacters 2FF0 - 2FFF      
IsCJKSymbolsandPunctuation 3000 - 303F      
IsHiragana 3040 - 309F      
IsKatakana 30A0 - 30FF      
IsBopomofo 3100 - 312F      
IsHangulCompatibilityJamo 3130 - 318F      
IsKanbun 3190 - 319F      
IsBopomofoExtended 31A0 - 31BF      
IsKatakanaPhoneticExtensions 31F0 - 31FF      
IsEnclosedCJKLettersandMonths 3200 - 32FF      
IsCJKCompatibility 3300 - 33FF      
IsCJKUnifiedIdeographsExtensionA 3400 - 4DBF      
IsYijingHexagramSymbols 4DC0 - 4DFF      
IsCJKUnifiedIdeographs 4E00 - 9FFF      
IsYiSyllables A000 - A48F      
IsYiRadicals A490 - A4CF      
IsHangulSyllables AC00 - D7AF      
IsHighSurrogates D800 - DB7F      
IsHighPrivateUseSurrogates DB80 - DBFF      
IsLowSurrogates DC00 - DFFF      
IsPrivateUse or IsPrivateUseArea E000 - F8FF      
IsCJKCompatibilityIdeographs F900 - FAFF      
IsAlphabeticPresentationForms FB00 - FB4F      
IsArabicPresentationForms-A FB50 - FDFF      
IsVariationSelectors FE00 - FE0F      
IsCombiningHalfMarks FE20 - FE2F      
IsCJKCompatibilityForms FE30 - FE4F      
IsSmallFormVariants FE50 - FE6F      
IsArabicPresentationForms-B FE70 - FEFF      
IsHalfwidthandFullwidthForms FF00 - FFEF      
IsSpecials FFF0 - FFFF      
[\f\n\r\t\v\x85\p{Z}      
standard decimal digits 0-9 as well as the decimal digits of a number of other character sets      

See also

  • Regular Expression Language - Quick Reference

Examples

Examples of Character Classes
ASP.NET Code Input:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "01 345"&ChrW(913)&"67 9abc def"&Chr(13)&Chr(10)&"7890"&Chr(13)&Chr(10)
               Dim xmatchstr As String = ""
               Dim xoption As RegexOptions = RegexOptions.Multiline
               xmatchstr = xmatchstr & "Given string: " & """01 345""&amp;ChrW(913)&amp;""67 9abc def""&amp;Chr(13)&amp;Chr(10)&amp;""7890""&amp;Chr(13)&amp;Chr(10)" & "<br />"
               xmatchstr = xmatchstr & showresult(xstring,".+",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\P{Ll}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\p{Ll}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\p{IsBasicLatin}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\P{IsBasicLatin}",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\w",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\W",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\s",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\S",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\d",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"\D",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[1357\r]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[^1357\r]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[1357\r-[5]]",RegexOptions.None)
               xmatchstr = xmatchstr & showresult(xstring,"[^1357\r-[5]]",RegexOptions.None)
               lbl01.Text = xmatchstr
           End Sub
           Function showresult(xstring,xpattern,xoption)
               Dim xmatches As MatchCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Matches(string,""" & xpattern & """," & xoption & "): "
               xmatches = Regex.Matches(xstring,xpattern,xoption)
               xmatchstr = xmatchstr & "<br />->Result of MatchCollection.Count: """
               xmatchstr = xmatchstr & xmatches.Count & """<br />"
               For xint = 0 to xmatches.Count - 1
                   xmatchstr = xmatchstr & "->->Result of MatchCollection("& xint & ").Value, Index, Length: """
                   xmatchstr = xmatchstr & xmatches(xint).Value & ", " & xmatches(xint).Index & ", " & xmatches(xint).Length & """<br />"
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Character Classes</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>
HTML Web Page Embedded Output:

Source/Reference

  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/character-escapes-in-regular-expressions

©sideway

ID: 190700026 Last Updated: 7/26/2019 Revision: 0 Ref:

close

References

  1. Active Server Pages,  , http://msdn.microsoft.com/en-us/library/aa286483.aspx
  2. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929%28v=vs.90%29.aspx
  3. ASP Best Practices,  , http://technet.microsoft.com/en-us/library/cc939157.aspx
  4. ASP Built-in Objects,  , http://msdn.microsoft.com/en-us/library/ie/ms524716(v=vs.90).aspx
  5. Response Object,  , http://msdn.microsoft.com/en-us/library/ms525405(v=vs.90).aspx
  6. Request Object,  , http://msdn.microsoft.com/en-us/library/ms524948(v=vs.90).aspx
  7. Server Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525541(v=vs.90).aspx
  8. Application Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525360(v=vs.90).aspx
  9. Session Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms524319(8v=vs.90).aspx
  10. ASPError Object,  , http://msdn.microsoft.com/en-us/library/ms524942(v=vs.90).aspx
  11. ObjectContext Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525667(v=vs.90).aspx
  12. Debugging Global.asa Files,  , http://msdn.microsoft.com/en-us/library/aa291249(v=vs.71).aspx
  13. How to: Debug Global.asa files,  , http://msdn.microsoft.com/en-us/library/ms241868(v=vs.80).aspx
  14. Calling COM Components from ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524620(v=VS.90).aspx
  15. IIS ASP Scripting Reference,  , http://msdn.microsoft.com/en-us/library/ms524664(v=vs.90).aspx
  16. ASP Keywords,  , http://msdn.microsoft.com/en-us/library/ms524672(v=vs.90).aspx
  17. Creating Simple ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524741(v=vs.90).aspx
  18. Including Files in ASP Applications,  , http://msdn.microsoft.com/en-us/library/ms524876(v=vs.90).aspx
  19. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929(v=vs.90).aspx
  20. FileSystemObject Object,  , http://msdn.microsoft.com/en-us/library/z9ty6h50(v=vs.84).aspx
  21. http://msdn.microsoft.com/en-us/library/windows/desktop/ms675944(v=vs.85).aspx,  , ADO Object Model
  22. ADO Fundamentals,  , http://msdn.microsoft.com/en-us/library/windows/desktop/ms680928(v=vs.85).aspx
close

Latest Updated LinksValid XHTML 1.0 Transitional Valid CSS!Nu Html Checker Firefox53 Chromena IExplorerna
IMAGE

Home 5

Business

Management

HBR 3

Information

Recreation

Hobbies 8

Culture

Chinese 1097

English 339

Reference 79

Computer

Hardware 249

Software

Application 213

Digitization 32

Latex 52

Manim 205

KB 1

Numeric 19

Programming

Web 289

Unicode 504

HTML 66

CSS 65

SVG 46

ASP.NET 270

OS 429

DeskTop 7

Python 72

Knowledge

Mathematics

Formulas 8

Algebra 84

Number Theory 206

Trigonometry 31

Geometry 34

Coordinate Geometry 2

Calculus 67

Complex Analysis 21

Engineering

Tables 8

Mechanical

Mechanics 1

Rigid Bodies

Statics 92

Dynamics 37

Fluid 5

Fluid Kinematics 5

Control

Process Control 1

Acoustics 19

FiniteElement 2

Natural Sciences

Matter 1

Electric 27

Biology 1

Geography 1


Copyright © 2000-2024 Sideway . All rights reserved Disclaimers last modified on 06 September 2019