Unicode Punctuation and Digits

Post Reply
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Unicode Punctuation and Digits

Post by DGDanforth »

A Component Pascal identifier is (BB1.6) specified by

Code: Select all

ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | "À".."Ö" | "Ø".."ö" | "ø".."ÿ".
digit 	= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".
Since we are expanding Component Pascal to include Unicode identifiers it is necessary to modify the definition of letter and digit.

I suggest (open for discussion) that a letter is any Unicode code point not including punctuation or digit where now those quantities (it seems to me) are specified as

Code: Select all

Unicode Punctuation and Digits

(extracted from: http://www.fileformat.info/info/unicode/category/Nd/list.htm)

U+2000..U+206F	General Punctuation
U+2E00..U+2E7F	Supplemental Punctuation

U+0030..U+0039	DIGIT
U+0660 .U+0669	ARABIC-INDIC DIGIT
U+06F0..U+06F9	EXTENDED ARABIC-INDIC DIGIT
U+07C0..U+07C9	NKO DIGIT
U+0966..U+096F	DEVANAGARI DIGIT
U+09E6..U+09EF	BENGALI DIGIT
U+0A66..U+0A6F	GURMUKHI DIGIT
U+0AE6..U+0AEF	GUJARATI DIGIT
U+0B66..U+0B6F	ORIYA DIGIT
U+0BE6..U+0BEF	TAMIL DIGIT
U+0C66..U+0C6F	TELUGU DIGIT
U+0CE6..U+0CEF	KANNADA DIGIT
U+0D66..U+0D6F	MALAYALAM DIGIT
U+0DE6..U+0DEF	SINHALA LITH DIGIT
U+0E50..U+0E59	THAI DIGIT
U+0ED0..U+0ED9	LAO DIGIT
U+0F20..U+0F29	TIBETAN DIGIT
U+1040..U+1049	MYANMAR DIGIT
U+1090..U+1099	MYANMAR SHAN DIGIT
U+17E0..U+17E9	KHMER DIGIT
U+1810..U+1819	MONGOLIAN DIGIT
U+1946..U+194F	LIMBU DIGIT
U+19D0..U+19D9	NEW TAI LUE DIGIT
U+1A80..U+1A89	TAI THAM HORA DIGIT
U+1A90..U+1A99	TAI THAM THAM
U+1B50..U+1B59	BALINESE DIGIT
U+1BB0..U+1BB9	SUNDANESE DIGIT
U+1C40..U+1C49	LEPCHA DIGIT
U+1C50..U+1C59	OL CHIKI DIGIT
U+A620..U+A629	VAI DIGIT
U+A8D0..U+A8D9	SAURASHTRA DIGIT
U+A900..U+A909	KAYAH LI DIGIT
U+A9D0..U+A9D9	JAVANESE DIGIT
U+A9F0..U+A9F9	MYANMAR TAI LAING DIGIT
U+AA50..U+AA59	CHAM DIGIT
U+ABF0..U+ABF9	MEETEI MAYEK DIGIT
U+FF10..U+FF19	FULLWIDTH DIGIT
U+104A0..U+104A8	OSMANYA DIGIT
U+104A9..U+1106F	OSMANYA DIGIT
U+110F0..U+110F9	SORA SOMPENG DIGIT
U+11136..U+1113F	CHAKMA DIGIT
U+111D0..U+111D9	SHARADA DIGIT
U+112F0..U+112F9	KHUDAWADI DIGIT
U+114D0..U+114D9	TIRHUTA DIGIT
U+11650..U+11659	MODI DIGIT
U+116C0..U+116C9	TAKRI DIGIT
U+118E0..U+118E9	WARANG CITI DIGIT
U+16A60..U+16A69	MRO DIGIT
U+16B50..U+16B59	PAHAWH HMONG DIGIT
U+1D7CE..U+1D7D7	MATHEMATICAL BOLD DIGIT
U+1D7D8..U+1D7E1	MATHEMATICAL DOUBLE-STRUCK DIGIT
U+1D7E2..U+1D7EB	MATHEMATICAL SANS-SERIF DIGIT
U+1D7EC..U+1D7F5	MATHEMATICAL SANS-SERIF BOLD DIGIT
U+1D7F6..U+1D7FF	MATHEMATICAL MONOSPACE DIGIT
Post Reply