Tool that generates special characters using Unicode

Zinn · Post by **Zinn** » Sun Oct 26, 2014 6:12 am

DGDanforth wrote:So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?

Blackbox 1.6 editor and Blackbox 1.7 editor work with 16 bit Unicode only. Everywhere there are some limitations. You can't have it all. if you need more, it is your task to change it. I don't do it.

DGDanforth · Post by **DGDanforth** » Sun Oct 26, 2014 8:42 pm

Zinn wrote:
DGDanforth wrote:So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?
Blackbox 1.6 editor and Blackbox 1.7 editor work with 16 bit Unicode only. Everywhere there are some limitations. You can't have it all. if you need more, it is your task to change it. I don't do it.

Hence what we need to say in any documentation about multiple language support is that we support a
subset of Unicode, those code points that can be represented by 16 bits.

=========
Since your work falls within the 16 bits why are you using UTF-8?
Is that simply to map 'externally' supplied UTF-8 to 16 bit 'internal' codes?

Josef Templ · Post by **Josef Templ** » Tue Oct 28, 2014 9:13 am

Everything above 16-bit Unicode is meaningless for a programmer.
It is only used for ancient egyptian or maya hieroglyphs etc.
For a programmer, 16-bit Unicode is the full Unicode.
Nevertheless I have changed the issue topic to express that it supports 16-bit Unicode.

- Josef

DGDanforth · Post by **DGDanforth** » Tue Oct 28, 2014 6:08 pm

Josef Templ wrote:Everything above 16-bit Unicode is meaningless for a programmer.
It is only used for ancient egyptian or maya hieroglyphs etc.
For a programmer, 16-bit Unicode is the full Unicode.
Nevertheless I have changed the issue topic to express that it supports 16-bit Unicode.

- Josef

I agree for programmers 16-Bit Unicode is sufficient.
But it would be nice to say that the editor can create documents for any language.

I think I see (vaguely) a way to do that. If all future odc files use only 8-bit SHORTCHAR then Utf8 encoding is easily handled. Legacy odc files could (?) be detected and automatically converted to Utf8. Utf8 to legacy CHAR encoding would not (?) work for codes exceeding 16 bits but that again could be detected and the user notified.

Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.

The only thing needed to change would be the editor (not trivial).

-Doug

Zinn · Post by **Zinn** » Tue Oct 28, 2014 9:34 pm

DGDanforth wrote: Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.
-Doug

Sorry Doug, your concept does not work. The 1.7 compiler expect 16 bit Unicode and not Utf8.
It is the same as by the 1.6 compiler. I have not change this kind of behaviour.

DGDanforth · Post by **DGDanforth** » Tue Oct 28, 2014 10:31 pm

Zinn wrote:
DGDanforth wrote: Using Utf8 odc files would then be compatible with Helmut's 1.7 compiler.
-Doug
Sorry Doug, your concept does not work. The 1.7 compiler expect 16 bit Unicode and not Utf8.
It is the same as by the 1.6 compiler. I have not change this kind of behaviour.

Helmut,
Yes, I misspoke. After looking at Ivan's diff files for Analyzer I can see that.
Now I need to go back and re read why Utf8 was initially mentioned.

DGDanforth · Post by **DGDanforth** » Tue Oct 28, 2014 10:43 pm

A second attempt (by Helmut) has been made to support full Unicode in identifiers very much like it has been
done by Ominc for string constants. Identifiers are stored in symbol and object files as
Utf-8 encoded SHORTCHARs. For reasons of simplicity, also the compiler uses Utf-8 encoded SHORTCHARs
for representing identifiers in memory. This also proofed to work with a small exception: the
detection of the identifier length limit, but this can be fixed easily.
The advantage of this approach is that it avoids using code pages and it creates fully portable files.
If only ASCII characters are used, there is no change in the symbol or object file format.

The obvious alternative to the second attempt is to use CHARs instead of SHORTCHARs in the compiler
for representing identifiers. Since this also applies to parts of the runtime system that use the
data type Kernel.Name, it looks simpler to stay with Helmut's solution.
Switching from SHORTCHARs to CHARs internally can be done later if there is any need for it.

- Josef

So Helmut you say you don't use Utf-8 for the compiler. How is that reconciled with Josef's comments?

Zinn · Post by **Zinn** » Wed Oct 29, 2014 8:58 pm

I use Utf-8 for the representation of identifiers inside symbol tables and not for the source of compilations.

DGDanforth · Post by **DGDanforth** » Thu Oct 30, 2014 9:32 pm

OK, good. So input is 16 bits and output is Utf8.
Now all of that makes sense.

ReneK · Post by **ReneK** » Wed Nov 26, 2014 9:05 am

I'm sorry, but I do not understand the Topic of non-Latin characters in source code. I mean, I can understand that someone wants to have comments in his native language and script, but having names in different scripts? What the heck for?

I just imagine that there is a module that does some things that I need.
I download it only to find out that I do not know how to type the exported Name because it is in Kanji.

Wouldn't that be madness?

I mean, one of the main principles of Wirthian languages is reusability of code.

How reusable is it, if I cannot even type it? Wouldn't this of necessity lead to the same module (with slight variations), once written in Kanji, once in Hanja, once in KuNom, once in Cyrilic and once in Latin? Who would benefit from such nonsense?

If going down that road, we could also demand that every module of the Framework should be written in English, Russian, Czech, Hungarian, Italian, Spanish, Serbian, Mongolian, Chinese,...... I cannot fanthom what this would be good for.

For what it is worth, Wirth was Swiss, and AFAIK, his mother tongue was German. He didn't choose German as the language he took words from to form Oberon, but English. Why? because English is the lingua franca of our day. Go to China, and you will find People there, especially at University, who speak English, at least in a basic form. Go to Italy, and that's the same. But try your luck with German, and all bets are off.

IMHO, documentation needs to be multilingual. Menues need to be multilingual. Dialogs need to be multilingual. It must be possible to have comments in various languages, and possibly in various scripts. But names and keywords must be in English only.

What am I missing?

BlackBox Framework Center

Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode