Re: Feature #9: adding module Characters
Posted: Tue Sep 23, 2014 11:05 am
Josef,
thanks a lot for the clarification.
I have a slight problem understanding the difference between ARRAY OF BYTE and ARRAY OF SHORTCHAR. I do not remember, if the special role of a formal paramater of ARRAY OF BYTE being compatible with any argument is retained in Component Pascal, if you mean that.
But the whole discussions throws us into the problems of a universal character set and I realized when re-reading the language report that CHAR is also limited to the 16 bit range of UCS-16, with which it is not possible to map the complete Unicode character set. As far as I know, A2/Aos stores UTF-8 on disk/files but maps them to UCS-32 in memory and therefore avoiding the problems with string length and allocation/length differences. I personally dislike UTF-8 coding in memory since the size requirements of a string can be much larger than its number of characters.
Allowing identifiers to be an UTF-8 encoded ARRAY OF SHORTCHAR seems to be a solution.
I fear far east people could expect us to support also their characters, although a chinese collegue ensured me that the difficulty and ambiguity in entering such characters are far more difficult as anything gained from it, so what should we do?
--
Bernhard
thanks a lot for the clarification.
I have a slight problem understanding the difference between ARRAY OF BYTE and ARRAY OF SHORTCHAR. I do not remember, if the special role of a formal paramater of ARRAY OF BYTE being compatible with any argument is retained in Component Pascal, if you mean that.
But the whole discussions throws us into the problems of a universal character set and I realized when re-reading the language report that CHAR is also limited to the 16 bit range of UCS-16, with which it is not possible to map the complete Unicode character set. As far as I know, A2/Aos stores UTF-8 on disk/files but maps them to UCS-32 in memory and therefore avoiding the problems with string length and allocation/length differences. I personally dislike UTF-8 coding in memory since the size requirements of a string can be much larger than its number of characters.
Allowing identifiers to be an UTF-8 encoded ARRAY OF SHORTCHAR seems to be a solution.
I fear far east people could expect us to support also their characters, although a chinese collegue ensured me that the difficulty and ambiguity in entering such characters are far more difficult as anything gained from it, so what should we do?
--
Bernhard