Page 1 of 2
Issue #19: Unicode for Component Pascal identifiers
Posted: Mon Dec 08, 2014 6:20 am
by DGDanforth
Shall the Center adopt Unicode for Component Pascal identifiers?
Where by "Unicode" it is meant that any null terminated string of CHAR can be used as an identifier.
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Mon Dec 08, 2014 9:45 am
by Bernhard
DGDanforth wrote:any null terminated string of CHAR can be used as an identifier.
hmm, do you really mean any?
I think it still should obey to the restrictions of Language Report 3.1:
1. Identifiers are sequences of letters, digits, and underscores. The first character must not be a digit.
ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | "À".."Ö" | "Ø".."ö" | "ø".."ÿ".
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".
and only the range of letters there is extended to letters of the original Unicode specification (similar to how it is specified for Java
http://docs.oracle.com/javase/7/docs/ap ... acter.html).
--
Bernhard
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Mon Dec 08, 2014 11:00 am
by Josef Templ
identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
- Josef
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Mon Dec 08, 2014 11:33 am
by Bernhard
Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Tue Dec 09, 2014 2:21 am
by DGDanforth
bernhard wrote:Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.
Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Tue Dec 09, 2014 8:43 am
by Josef Templ
> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.
- Josef
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Tue Dec 09, 2014 9:01 am
by Ivan Denisov
The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Tue Dec 09, 2014 9:40 am
by Josef Templ
Ivan Denisov wrote:The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
Please have a look at Docu/CP-Lang.odc
3. Vocabulary and Representation
in topic branch #19
- Josef
Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Tue Dec 09, 2014 9:57 am
by Ivan Denisov
Josef Templ wrote:Please have a look at Docu/CP-Lang.odc
3. Vocabulary and Representation
in topic branch #19
Code: Select all
ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | UnicodeLetter.
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".
This looks reasonable

Re: Issue #19: Unicode for Component Pascal identifiers
Posted: Wed Dec 10, 2014 5:48 am
by DGDanforth
You are right again. I am over simplifying the problem. So now I need to go and look up the definition of
"alphabetical Unicode character."
-Doug
Josef Templ wrote:> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.
- Josef