Issue #19: Unicode for Component Pascal identifiers
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Issue #19: Unicode for Component Pascal identifiers
Shall the Center adopt Unicode for Component Pascal identifiers?
Where by "Unicode" it is meant that any null terminated string of CHAR can be used as an identifier.
Where by "Unicode" it is meant that any null terminated string of CHAR can be used as an identifier.
Re: Issue #19: Unicode for Component Pascal identifiers
hmm, do you really mean any?DGDanforth wrote:any null terminated string of CHAR can be used as an identifier.
I think it still should obey to the restrictions of Language Report 3.1:
1. Identifiers are sequences of letters, digits, and underscores. The first character must not be a digit.
ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | "À".."Ö" | "Ø".."ö" | "ø".."ÿ".
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".
and only the range of letters there is extended to letters of the original Unicode specification (similar to how it is specified for Java http://docs.oracle.com/javase/7/docs/ap ... acter.html).
--
Bernhard
- Josef Templ
- Posts: 2048
- Joined: Tue Sep 17, 2013 6:50 am
Re: Issue #19: Unicode for Component Pascal identifiers
identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
- Josef
(alphabetical Unicode characters) not only those from ISO-Latin 1.
- Josef
Re: Issue #19: Unicode for Component Pascal identifiers
thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.bernhard wrote:thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
- Josef Templ
- Posts: 2048
- Joined: Tue Sep 17, 2013 6:50 am
Re: Issue #19: Unicode for Component Pascal identifiers
> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.
- Josef
No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.
- Josef
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
letter = ???
- Josef Templ
- Posts: 2048
- Joined: Tue Sep 17, 2013 6:50 am
Re: Issue #19: Unicode for Component Pascal identifiers
Please have a look at Docu/CP-Lang.odcIvan Denisov wrote:The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
3. Vocabulary and Representation
in topic branch #19
- Josef
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Josef Templ wrote:Please have a look at Docu/CP-Lang.odc
3. Vocabulary and Representation
in topic branch #19
Code: Select all
ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | UnicodeLetter.
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".

- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
You are right again. I am over simplifying the problem. So now I need to go and look up the definition of
"alphabetical Unicode character."
-Doug
"alphabetical Unicode character."
-Doug
Josef Templ wrote:> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.
- Josef