Issue #19: Unicode for Component Pascal identifiers

Adopt Unicode for Component Pascal identifiers?

ABSTAIN
0
No votes
YES
7
78%
NO
2
22%
 
Total votes: 9

User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Issue #19: Unicode for Component Pascal identifiers

Post by DGDanforth »

Shall the Center adopt Unicode for Component Pascal identifiers?
Where by "Unicode" it is meant that any null terminated string of CHAR can be used as an identifier.
Bernhard
Posts: 68
Joined: Tue Sep 17, 2013 6:56 am
Location: Munich, Germany

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Bernhard »

DGDanforth wrote:any null terminated string of CHAR can be used as an identifier.
hmm, do you really mean any?

I think it still should obey to the restrictions of Language Report 3.1:

1. Identifiers are sequences of letters, digits, and underscores. The first character must not be a digit.

ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | "À".."Ö" | "Ø".."ö" | "ø".."ÿ".
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".

and only the range of letters there is extended to letters of the original Unicode specification (similar to how it is specified for Java http://docs.oracle.com/javase/7/docs/ap ... acter.html).
--
Bernhard
User avatar
Josef Templ
Posts: 2048
Joined: Tue Sep 17, 2013 6:50 am

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Josef Templ »

identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.

- Josef
Bernhard
Posts: 68
Joined: Tue Sep 17, 2013 6:56 am
Location: Munich, Germany

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Bernhard »

Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Issue #19: Unicode for Component Pascal identifiers

Post by DGDanforth »

bernhard wrote:
Josef Templ wrote:identifiers still follow the usual lexical rules but allow for inclusion of any Unicode letter
(alphabetical Unicode characters) not only those from ISO-Latin 1.
thanks, I had interpeted it this way, but I just was a bit puzzled by Doug's formulation.
Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.
User avatar
Josef Templ
Posts: 2048
Joined: Tue Sep 17, 2013 6:50 am

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Josef Templ »

> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.

No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.

- Josef
Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Ivan Denisov »

The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
User avatar
Josef Templ
Posts: 2048
Joined: Tue Sep 17, 2013 6:50 am

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Josef Templ »

Ivan Denisov wrote:The voting is over, so we can apply changes. Also we need to think that to do with Language Report, because letter definition is changing.
letter = ???
Please have a look at Docu/CP-Lang.odc
3. Vocabulary and Representation

in topic branch #19

- Josef
Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: Issue #19: Unicode for Component Pascal identifiers

Post by Ivan Denisov »

Josef Templ wrote:Please have a look at Docu/CP-Lang.odc
3. Vocabulary and Representation

in topic branch #19

Code: Select all

ident = (letter | "_") {letter | "_" | digit}.
letter = "A" .. "Z" | "a" .. "z" | UnicodeLetter.
digit 	= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9".
This looks reasonable :)
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Issue #19: Unicode for Component Pascal identifiers

Post by DGDanforth »

You are right again. I am over simplifying the problem. So now I need to go and look up the definition of
"alphabetical Unicode character."

-Doug
Josef Templ wrote:> Thank you Bernhard and Josef. Yes, the identifier syntax must still hold. The 'letter' can be any CHAR value. Notice that I prefer to avoid the use of the term 'Unicode' and instead refer to letters as 16 bit values.

No, a letter in this context is not any CHAR value. It is any CHAR value that is a letter, i.e. an alphabetical Unicode character.
There are many more Unicode characters such as interpunctuation marks, etc. Those are not allowed within an identifier.
This is exactly the same as with ISO-Latin 1, which also contains more characters than those allowed within an identifier.

- Josef
Locked