General API #
As a general rule, for a given a Unicode property called Unicode_Property
,
for example:
If the property is boolean valued then the implementation is called
Unicode.isUnicodeProperty
.Otherwise, the implementation is called
Unicode.getUnicodeProperty
.If the value is not of standard type (
Bool
,Char
,Nat
,Int
, etc.) or a combination thereof (e.g.Bool × Option Nat
) then the value type is defined inUnicodeBasic.Types
.If the input type needs disambiguation (e.g.
Char
vsString
) the type name may be appended to the name as inUnicode.isUnicodePropertyChar
or inUnicode.getUnicodePropertyString
.If the output type is
Option _
then the suffix?
may be appended to indicate that this is a partial function. In this case, a companion function with the suffix!
may be implemented. This function will perform the same calculation as the original but it assumes the user has checked that the input is in the domain, the function may panic if this is not the case.
There are exceptions and variations on these general rules. For example,
Unicode.isInGeneralCategory
checks whether a character belongs to a given
general category. Because of derived general categories, this makes more
sense than what the first rule above would give.
Some properties may be grouped in a namespace. The namespace
Unicode.GeneralCategory
is such. For example, Unicode.GeneralCategory.isL
checks whether a character belongs to the derived general category L
.
Name #
Get character name
When the Unicode property Name
is empty, a unique code label is returned
as described in Unicode Standard, section 4.8. These labels start with
'<'
(U+003C) and end with '>'
(U+003E) so they are distinguishable from
actual name values.
Unicode property: Name
Equations
- Unicode.getName char = Unicode.lookupName char.val
Instances For
Bidirectional Properties #
Get character bidirectional class
Unicode property: Bidi_Class
Equations
- Unicode.getBidiClass char = Unicode.lookupBidiClass char.val
Instances For
Check if bidirectional mirrored character
Unicode property: Bidi_Mirrored
Equations
- Unicode.isBidiMirrored char = Unicode.lookupBidiMirrored char.val
Instances For
Check if bidirectional control character
Unicode property: Bidi_Control
Equations
- One or more equations did not get rendered due to their size.
Instances For
General Category #
Get character general category
Caveat: This function never returns a derived general category. Use
Unicode.isInGeneralCategory
to check whether a character belongs to a
general category (derived or not).
Unicode property: General_Category
Equations
- Unicode.getGeneralCategory char = if h : char.toNat < Unicode.getGeneralCategory.table.size then Unicode.getGeneralCategory.table[char.toNat] else Unicode.lookupGeneralCategory char.val
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Check if character belongs to the general category
Unicode property: General_Category
Instances For
Check if letter character (L
)
This is a derived category (L = Lu | Ll | Lt | Lm | Lo
).
Unicode Property: General_Category=L
Instances For
Check if letter character (L
)
This is a derived category (L = Lu | Ll | Lt | Lm | Lo
).
Unicode Property: General_Category=L
Instances For
Check if lowercase letter character (Ll
)
Unicode Property: General_Category=Ll
Instances For
Check if lowercase letter character (Ll
)
Unicode Property: General_Category=Ll
Instances For
Check if titlecase letter character (Lt
)
Unicode Property: General_Category=Lt
Instances For
Check if titlecase letter character (Lt
)
Unicode Property: General_Category=Lt
Instances For
Check if uppercase letter character (Lu
)
Unicode Property: General_Category=Lu
Instances For
Check if uppercase letter character (Lu
)
Unicode Property: General_Category=Lu
Instances For
Check if cased letter character (LC
)
This is a derived category (L = Lu | Ll | Lt
).
Unicode Property: General_Category=LC
Instances For
Check if cased letter character (LC
)
This is a derived category (L = Lu | Ll | Lt
).
Unicode Property: General_Category=LC
Instances For
Check if modifier letter character (Lm
)
Unicode Property: General_Category=Lm
Instances For
Check if modifier letter character (Lm
)
Unicode Property: General_Category=Lm
Instances For
Check if other letter character (Lo
)
Unicode Property: General_Category=Lo
Instances For
Check if other letter character (Lo
)
Unicode Property: General_Category=Lo
Instances For
Check if mark character (M
)
This is a derived category (M = Mn | Mc | Me
).
Unicode Property: General_Category=M
Instances For
Check if mark character (M
)
This is a derived category (M = Mn | Mc | Me
).
Unicode Property: General_Category=M
Instances For
Check if nonspacing combining mark character (Mn
)
Unicode Property: General_Category=Mn
Instances For
Check if nonspacing combining mark character (Mn
)
Unicode Property: General_Category=Mn
Instances For
Check if spacing combining mark character (Mc
)
Unicode Property: General_Category=Mc
Instances For
Check if spacing combining mark character (Mc
)
Unicode Property: General_Category=Mc
Instances For
Check if enclosing combining mark character (Me
)
Unicode Property: General_Category=Me
Instances For
Check if enclosing combining mark character (Me
)
Unicode Property: General_Category=Me
Instances For
Check if number character (N
)
This is a derived category (N = Nd | Nl | No
).
Unicode Property: General_Category=N
Instances For
Check if number character (N
)
This is a derived category (N = Nd | Nl | No
).
Unicode Property: General_Category=N
Instances For
Check if decimal number character (Nd
)
Unicode Property: General_Category=Nd
Instances For
Check if decimal number character (Nd
)
Unicode Property: General_Category=Nd
Instances For
Check if letter number character (Nl
)
Unicode Property: General_Category=Nl
Instances For
Check if letter number character (Nl
)
Unicode Property: General_Category=Nl
Instances For
Check if other number character (No
)
Unicode Property: General_Category=No
Instances For
Check if other number character (No
)
Unicode Property: General_Category=No
Instances For
Check if punctuation character (P
)
This is a derived category (P = Pc | Pd | Ps | Pe | Pi | Pf | Po
).
Unicode Property: General_Category=P
Instances For
Check if punctuation character (P
)
This is a derived category (P = Pc | Pd | Ps | Pe | Pi | Pf | Po
).
Unicode Property: General_Category=P
Instances For
Check if connector punctuation character (Pc
)
Unicode Property: General_Category=Pc
Instances For
Check if connector punctuation character (Pc
)
Unicode Property: General_Category=Pc
Instances For
Check if dash punctuation character (Pd
)
Unicode Property: General_Category=Pd
Instances For
Check if dash punctuation character (Pd
)
Unicode Property: General_Category=Pd
Instances For
Check if grouping punctuation character (PG
)
This is a derived category (PG = Ps | Pe
).
Unicode Property: General_Category=PG
Instances For
Check if grouping punctuation character (PG
)
This is a derived category (PG = Ps | Pe
).
Unicode Property: General_Category=PG
Instances For
Check if open punctuation character (Ps
)
Unicode Property: General_Category=Ps
Instances For
Check if open punctuation character (Ps
)
Unicode Property: General_Category=Ps
Instances For
Check if close punctuation character (Pe
)
Unicode Property: General_Category=Pe
Instances For
Check if close punctuation character (Pe
)
Unicode Property: General_Category=Pe
Instances For
Check if quoting punctuation character (PQ
)
This is a derived category (PQ = Pi | Pf
).
Unicode Property: General_Category=PQ
Instances For
Check if quoting punctuation character (PQ
)
This is a derived category (PQ = Pi | Pf
).
Unicode Property: General_Category=PQ
Instances For
Check if initial punctuation character (Pi
)
Unicode Property: General_Category=Pi
Instances For
Check if initial punctuation character (Pi
)
Unicode Property: General_Category=Pi
Instances For
Check if initial punctuation character (Pf
)
Unicode Property: General_Category=Pf
Instances For
Check if initial punctuation character (Pf
)
Unicode Property: General_Category=Pf
Instances For
Check if other punctuation character (Po
)
Unicode Property: General_Category=Po
Instances For
Check if other punctuation character (Po
)
Unicode Property: General_Category=Po
Instances For
Check if symbol character (S
)
This is a derived category (S = Sm | Sc | Sk | So
).
Unicode Property: General_Category=S
Instances For
Check if symbol character (S
)
This is a derived category (S = Sm | Sc | Sk | So
).
Unicode Property: General_Category=S
Instances For
Check if math symbol character (Sm
)
Unicode Property: General_Category=Sm
Instances For
Check if math symbol character (Sm
)
Unicode Property: General_Category=Sm
Instances For
Check if currency symbol character (Sc
)
Unicode Property: General_Category=Sc
Instances For
Check if currency symbol character (Sc
)
Unicode Property: General_Category=Sc
Instances For
Check if modifier symbol character (Sk
)
Unicode Property: General_Category=Sk
Instances For
Check if modifier symbol character (Sk
)
Unicode Property: General_Category=Sk
Instances For
Check if other symbol character (So
)
Unicode Property: General_Category=So
Instances For
Check if other symbol character (So
)
Unicode Property: General_Category=So
Instances For
Check if separator character (Z
)
This is a derived property (Z = Zs | Zl | Zp
).
Unicode Property: General_Category=Z
Instances For
Check if separator character (Z
)
This is a derived property (Z = Zs | Zl | Zp
).
Unicode Property: General_Category=Z
Instances For
Check if space separator character (Zs
)
Unicode Property: General_Category=Zs
Instances For
Check if space separator character (Zs
)
Unicode Property: General_Category=Zs
Instances For
Check if line separator character (Zl
)
Unicode Property: General_Category=Zl
Instances For
Check if line separator character (Zl
)
Unicode Property: General_Category=Zl
Instances For
Check if paragraph separator character (Zp
)
Unicode Property: General_Category=Zp
Instances For
Check if paragraph separator character (Zp
)
Unicode Property: General_Category=Zp
Instances For
Check if other character (C
)
This is a derived category (C = Cc | Cf | Cs | Co | Cn
).
Unicode Property: General_Category=C
Instances For
Check if other character (C
)
This is a derived category (C = Cc | Cf | Cs | Co | Cn
).
Unicode Property: General_Category=C
Instances For
Check if control character (Cc
)
Unicode Property: General_Category=Cc
Instances For
Check if control character (Cc
)
Unicode Property: General_Category=Cc
Instances For
Check if format character (Cf
)
Unicode Property: General_Category=Cf
Instances For
Check if format character (Cf
)
Unicode Property: General_Category=Cf
Instances For
Check if surrogate character (Cs
)
Unicode Property: General_Category=Cs
Instances For
Check if surrogate character (Cs
)
Unicode Property: General_Category=Cs
Instances For
Check if private use character (Co
)
Unicode Property: General_Category=Co
Instances For
Check if private use character (Co
)
Unicode Property: General_Category=Co
Instances For
Check if unassigned character (Cn
)
Unicode Property: General_Category=Cn
Instances For
Check if unassigned character (Cn
)
Unicode Property: General_Category=Cn
Instances For
Case Type and Mapping #
Check if lowercase letter character
Generated by General_Category=Ll | Other_Lowercase
.
Unicode property: Lowercase
Equations
- Unicode.isLowercase char = if char.val < 128 then decide ('a' ≤ char) && decide (char ≤ 'z') else Unicode.lookupLowercase char.val
Instances For
Check if uppercase letter character
Generated by General_Category=Lu | Other_Uppercase
.
Unicode property: Uppercase
Equations
- Unicode.isUppercase char = if char.val < 128 then decide ('A' ≤ char) && decide (char ≤ 'Z') else Unicode.lookupUppercase char.val
Instances For
Check if cased letter character
Generated by General_Category=LC | Other_Lowercase | Other_Uppercase
.
Unicode property: Cased
Equations
Instances For
Check if character is ignorable for casing purposes
Generated from general categories Lm
, Mn
, Me
, Sk
, Cf
, and word
break properties MidLetter
, MidNumLet
, Single_Quote
.
Unicode property: Case_Ignorable
Instances For
Auxiliary data for isCaseIgnorable
Extracted from UCD auxiliary/WordBreakProperty.txt
.
Equations
- Unicode.isCaseIgnorable.other = [39, 46, 58, 183, 903, 1375, 1524, 8216, 8217, 8231, 8228, 65043, 65109, 65106, 65287, 65294, 65306]
Instances For
Map character to lowercase
This function does not handle the case where the lowercase mapping would consist of more than one character.
Unicode property: Simple_Lowercase_Mapping
Equations
- One or more equations did not get rendered due to their size.
Instances For
Map character to uppercase
This function does not handle the case where the uppercase mapping would consist of more than one character.
Unicode property: Simple_Uppercase_Mapping
Equations
- One or more equations did not get rendered due to their size.
Instances For
Map character to titlecase
This function does not handle the case where the titlecase mapping would consist of more than one character.
Unicode property: Simple_Titlecase_Mapping
Equations
- One or more equations did not get rendered due to their size.
Instances For
Decomposition Type and Mapping #
Get canonical combining class of character
Unicode property: Canonical_Combining_Class
Equations
- Unicode.getCanonicalCombiningClass char = if char.val < 128 then 0 else Unicode.lookupCanonicalCombiningClass char.val
Instances For
Get canonical decomposition of character (NFD
)
Unicode properties:
Decomposition_Mapping
Decomposition_Type=Canonical
Equations
- One or more equations did not get rendered due to their size.
Instances For
Get decomposition mapping of a character
This is used in normalization to canonical decomposition (NFD
) and compatibility
decomposition (NFKD
).
Unicode properties:
Decomposition_Type
Decomposition_Mapping
Equations
- Unicode.getDecompositionMapping? char = if char.val < 128 then none else Unicode.lookupDecompositionMapping? char.val
Instances For
Numeric Type and Value #
Check if character represents a numerical value
Unicode property: Numeric_Type=Numeric
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Check if character represents a digit in base 10
Unicode property: Numeric_Type=Digit
Equations
- One or more equations did not get rendered due to their size.
Instances For
Get value of digit
Unicode properties:
Numeric_Type=Digit
Numeric_Value
Equations
- One or more equations did not get rendered due to their size.
Instances For
Check if character represents a decimal digit
For this property, a character must be part of a contiguous sequence representing the ten decimal digits in order from 0 to 9.
Unicode property: Numeric_Type=Decimal
Equations
- One or more equations did not get rendered due to their size.
Instances For
Get decimal digit range
If the character is part of a contiguous sequence representing the ten decimal digits in order from 0 to 9, this function returns the first and last characters from this range.
Unicode properties:
Numeric_Type=Decimal
Numeric_Value
Equations
- One or more equations did not get rendered due to their size.
Instances For
Check if character represents a digit in base 16
Unicode property: Hex_Digit
Equations
- One or more equations did not get rendered due to their size.
Instances For
Get value of digit
Unicode property: Hex_Digit
Equations
- One or more equations did not get rendered due to their size.
Instances For
Other Properties #
Check if white space character
Unicode property: White_Space
Equations
Instances For
Check if mathematical symbol character
Generated by GeneralCategory=Sm | Other_Math
.
Unicode property: Math
Equations
- Unicode.isMath char = Unicode.lookupMath char.val
Instances For
Check if alphabetic character
Generated by GeneralCategory=L | GeneralCategory=Nl | Other_Alphabetic
.
Unicode property: Alphabetic
Equations
Instances For
Check if alphabetic character
Generated by GeneralCategory=L | GeneralCategory=Nl | Other_Alphabetic
.
Unicode property: Alphabetic