Blackbox 1.6 editor and Blackbox 1.7 editor work with 16 bit Unicode only. Everywhere there are some limitations. You can't have it all. if you need more, it is your task to change it. I don't do it.DGDanforth wrote:So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?
Tool that generates special characters using Unicode
Re: Tool that generates special characters using Unicode
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
Hence what we need to say in any documentation about multiple language support is that we support aZinn wrote:Blackbox 1.6 editor and Blackbox 1.7 editor work with 16 bit Unicode only. Everywhere there are some limitations. You can't have it all. if you need more, it is your task to change it. I don't do it.DGDanforth wrote:So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?
subset of Unicode, those code points that can be represented by 16 bits.
=========
Since your work falls within the 16 bits why are you using UTF-8?
Is that simply to map 'externally' supplied UTF-8 to 16 bit 'internal' codes?
- Josef Templ
- Posts: 2047
- Joined: Tue Sep 17, 2013 6:50 am
Re: Tool that generates special characters using Unicode
Everything above 16-bit Unicode is meaningless for a programmer.
It is only used for ancient egyptian or maya hieroglyphs etc.
For a programmer, 16-bit Unicode is the full Unicode.
Nevertheless I have changed the issue topic to express that it supports 16-bit Unicode.
- Josef
It is only used for ancient egyptian or maya hieroglyphs etc.
For a programmer, 16-bit Unicode is the full Unicode.
Nevertheless I have changed the issue topic to express that it supports 16-bit Unicode.
- Josef
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
I agree for programmers 16-Bit Unicode is sufficient.Josef Templ wrote:Everything above 16-bit Unicode is meaningless for a programmer.
It is only used for ancient egyptian or maya hieroglyphs etc.
For a programmer, 16-bit Unicode is the full Unicode.
Nevertheless I have changed the issue topic to express that it supports 16-bit Unicode.
- Josef
But it would be nice to say that the editor can create documents for any language.
I think I see (vaguely) a way to do that. If all future odc files use only 8-bit SHORTCHAR then Utf8 encoding is easily handled. Legacy odc files could (?) be detected and automatically converted to Utf8. Utf8 to legacy CHAR encoding would not (?) work for codes exceeding 16 bits but that again could be detected and the user notified.
Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.
The only thing needed to change would be the editor (not trivial).
-Doug
Re: Tool that generates special characters using Unicode
Sorry Doug, your concept does not work. The 1.7 compiler expect 16 bit Unicode and not Utf8.DGDanforth wrote: Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.
-Doug
It is the same as by the 1.6 compiler. I have not change this kind of behaviour.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
Helmut,Zinn wrote:Sorry Doug, your concept does not work. The 1.7 compiler expect 16 bit Unicode and not Utf8.DGDanforth wrote: Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.
-Doug
It is the same as by the 1.6 compiler. I have not change this kind of behaviour.
Yes, I misspoke. After looking at Ivan's diff files for Analyzer I can see that.
Now I need to go back and re read why Utf8 was initially mentioned.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
So Helmut you say you don't use Utf-8 for the compiler. How is that reconciled with Josef's comments?A second attempt (by Helmut) has been made to support full Unicode in identifiers very much like it has been
done by Ominc for string constants. Identifiers are stored in symbol and object files as
Utf-8 encoded SHORTCHARs. For reasons of simplicity, also the compiler uses Utf-8 encoded SHORTCHARs
for representing identifiers in memory. This also proofed to work with a small exception: the
detection of the identifier length limit, but this can be fixed easily.
The advantage of this approach is that it avoids using code pages and it creates fully portable files.
If only ASCII characters are used, there is no change in the symbol or object file format.
The obvious alternative to the second attempt is to use CHARs instead of SHORTCHARs in the compiler
for representing identifiers. Since this also applies to parts of the runtime system that use the
data type Kernel.Name, it looks simpler to stay with Helmut's solution.
Switching from SHORTCHARs to CHARs internally can be done later if there is any need for it.
- Josef
Re: Tool that generates special characters using Unicode
I use Utf-8 for the representation of identifiers inside symbol tables and not for the source of compilations.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
OK, good. So input is 16 bits and output is Utf8.
Now all of that makes sense.
Now all of that makes sense.
Re: Tool that generates special characters using Unicode
I'm sorry, but I do not understand the Topic of non-Latin characters in source code. I mean, I can understand that someone wants to have comments in his native language and script, but having names in different scripts? What the heck for?
I just imagine that there is a module that does some things that I need.
I download it only to find out that I do not know how to type the exported Name because it is in Kanji.
Wouldn't that be madness?
I mean, one of the main principles of Wirthian languages is reusability of code.
How reusable is it, if I cannot even type it? Wouldn't this of necessity lead to the same module (with slight variations), once written in Kanji, once in Hanja, once in KuNom, once in Cyrilic and once in Latin? Who would benefit from such nonsense?
If going down that road, we could also demand that every module of the Framework should be written in English, Russian, Czech, Hungarian, Italian, Spanish, Serbian, Mongolian, Chinese,...... I cannot fanthom what this would be good for.
For what it is worth, Wirth was Swiss, and AFAIK, his mother tongue was German. He didn't choose German as the language he took words from to form Oberon, but English. Why? because English is the lingua franca of our day. Go to China, and you will find People there, especially at University, who speak English, at least in a basic form. Go to Italy, and that's the same. But try your luck with German, and all bets are off.
IMHO, documentation needs to be multilingual. Menues need to be multilingual. Dialogs need to be multilingual. It must be possible to have comments in various languages, and possibly in various scripts. But names and keywords must be in English only.
What am I missing?
I just imagine that there is a module that does some things that I need.
I download it only to find out that I do not know how to type the exported Name because it is in Kanji.
Wouldn't that be madness?
I mean, one of the main principles of Wirthian languages is reusability of code.
How reusable is it, if I cannot even type it? Wouldn't this of necessity lead to the same module (with slight variations), once written in Kanji, once in Hanja, once in KuNom, once in Cyrilic and once in Latin? Who would benefit from such nonsense?
If going down that road, we could also demand that every module of the Framework should be written in English, Russian, Czech, Hungarian, Italian, Spanish, Serbian, Mongolian, Chinese,...... I cannot fanthom what this would be good for.
For what it is worth, Wirth was Swiss, and AFAIK, his mother tongue was German. He didn't choose German as the language he took words from to form Oberon, but English. Why? because English is the lingua franca of our day. Go to China, and you will find People there, especially at University, who speak English, at least in a basic form. Go to Italy, and that's the same. But try your luck with German, and all bets are off.
IMHO, documentation needs to be multilingual. Menues need to be multilingual. Dialogs need to be multilingual. It must be possible to have comments in various languages, and possibly in various scripts. But names and keywords must be in English only.
What am I missing?