issue-#19: Unicode for Component Pascal identifiers
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
We need to vote here:
- do we need to make validation in Utf8ToString according UTF-8 standard and can mark it for public usage
- we do not need to make validation in Utf8ToString and will use this procedure internally or mark it as unsafe in documentation
- do we need to make validation in Utf8ToString according UTF-8 standard and can mark it for public usage
- we do not need to make validation in Utf8ToString and will use this procedure internally or mark it as unsafe in documentation
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
It is possible case, that we will use such converter internally in Kernel without validity checking for better speed. But in Strings will provide the full Alexander version of converter. However I sure that difference in speed in total will be very small.Josef Templ wrote:Ivan, please keep it simple. You are inventing problems that don't exist.
Our Utf-8 converters convert any 'Valid' Component-Pascal string into Utf-8 AND back.
In Component Pascal a string is valid if it is 0X terminated.
Since Component-Pascal does not restrict the character codes, why should the Utf-8 converter?
With your approach you end up for some strings in a legal StringToUtf8 conversion that
cannot be converted back by Utf8ToString. This is really strange
and the alternative is so obvious and so simple.
Also Josef, you said "The hackers are very smart." 12 Aug 2014.
Please, read this: http://www.unicode.org/reports/tr36/#UTF-8_Exploit
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Zinn and Josef, please look at the table 3-7 here:
http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf
http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
Ivan,
It appears that a simple case statement would implement that table, right?
It appears that a simple case statement would implement that table, right?
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Case statements with simple "state machine":DGDanforth wrote:Ivan,
It appears that a simple case statement would implement that table, right?
http://forum.oberoncore.ru/viewtopic.ph ... 92b#p89571
- Josef Templ
- Posts: 2047
- Joined: Tue Sep 17, 2013 6:50 am
Re: Issue #19: Unicode for Component Pascal identifiers
> - Format errors inside identifier does not occur
If you decode a string that has been written by BB 1.6 into a symbol or object file
you may get a format error. Such a situation is possible and must be covered by the decoder.
Helmut, your approach is too simple; Ivan's approach is too complex.
My approach is exactly in between.
That approach detects format errors but does not care about the contents,
i.e. it treats all 16-bit characters codes as legal in the same way as Component Pascal does it.
It is so obvious that it is hard to believe for me that there is any discussion about it.
- Josef
If you decode a string that has been written by BB 1.6 into a symbol or object file
you may get a format error. Such a situation is possible and must be covered by the decoder.
Helmut, your approach is too simple; Ivan's approach is too complex.
My approach is exactly in between.
That approach detects format errors but does not care about the contents,
i.e. it treats all 16-bit characters codes as legal in the same way as Component Pascal does it.
It is so obvious that it is hard to believe for me that there is any discussion about it.
- Josef
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
I incorporate last Helmut bugfix into branch #19.
This also led to extinction of the building pipeline bug about appendingProperties in System/Strings.odc and in About dialog.
You can test new version here:
blackbox-1.7-a1.025.zip
blackbox-1.7-a1.025-setup.exe
Also I removed forced SHORTCHAR to CHAR converstion in a case of return 2 ("decode incomplete or error").
This also led to extinction of the building pipeline bug about appendingProperties in System/Strings.odc and in About dialog.
You can test new version here:
blackbox-1.7-a1.025.zip
blackbox-1.7-a1.025-setup.exe
Also I removed forced SHORTCHAR to CHAR converstion in a case of return 2 ("decode incomplete or error").
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Josef, Helmut, Doug and others, can we make the vote now?
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Josef, if you will use it internally there will be no problem. But you want to transfer this to Strings and people will use it for many purposes.Josef Templ wrote:> - Format errors inside identifier does not occur
If you decode a string that has been written by BB 1.6 into a symbol or object file
you may get a format error. Such a situation is possible and must be covered by the decoder.
Helmut, your approach is too simple; Ivan's approach is too complex.
My approach is exactly in between.
That approach detects format errors but does not care about the contents,
i.e. it treats all 16-bit characters codes as legal in the same way as Component Pascal does it.
It is so obvious that it is hard to believe for me that there is any discussion about it.
- Josef
Again, we should led the Unicode standard. I have found the table with well formed UTF-8. Alexander made some effort. I do not understand, why you do not like the single working algorithm?
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
I'd like to here more from Josef on the issue before we vote.Ivan Denisov wrote:Josef, Helmut, Doug and others, can we make the vote now?