Tool that generates special characters using Unicode
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Tool that generates special characters using Unicode
Editor
If the goal is to provide the infrastructure needed to support multiple languages then document editing needs to be considered. There are (at least) two aspects.
(1) keyboard mapping
(2) text ordering
I do not know how Russian programmers get their Cyrillic alphabet entered. Is it just a font change?
What about oriental languages? Are there fonts for Kanji, katakana?
If all the languages are handled by fonts then one simply needs to learn the keystrokes that result in the desired image.
What about the order of characters entered? The English convention supported by the BB document editor is left to right, top to bottom→↓.
I believe Arabic uses right to left, top to bottom←↓ and Chinese uses top to bottom, right to left↓←.
The sequence of the characters should be independent of the language so the compiler would not notice.
Such considerations indicate the difficulties in multiple language support.
If the goal is to provide the infrastructure needed to support multiple languages then document editing needs to be considered. There are (at least) two aspects.
(1) keyboard mapping
(2) text ordering
I do not know how Russian programmers get their Cyrillic alphabet entered. Is it just a font change?
What about oriental languages? Are there fonts for Kanji, katakana?
If all the languages are handled by fonts then one simply needs to learn the keystrokes that result in the desired image.
What about the order of characters entered? The English convention supported by the BB document editor is left to right, top to bottom→↓.
I believe Arabic uses right to left, top to bottom←↓ and Chinese uses top to bottom, right to left↓←.
The sequence of the characters should be independent of the language so the compiler would not notice.
Such considerations indicate the difficulties in multiple language support.
- Josef Templ
- Posts: 2047
- Joined: Tue Sep 17, 2013 6:50 am
Re: Menus files and multiple languages support
Doug, you enter Cyrillic or Kanji characters by entering the Unicode assigned to them.
All you need is a keyboard (or auxiliary tool) that generates such codes.
There is no font change involved.
It is clear that we have to restrict ourselves to top-to-bottom and left-to-right languages.
For the Cyrillic community this is sufficient as far as I know.
- Josef
All you need is a keyboard (or auxiliary tool) that generates such codes.
There is no font change involved.
It is clear that we have to restrict ourselves to top-to-bottom and left-to-right languages.
For the Cyrillic community this is sufficient as far as I know.
- Josef
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Menus files and multiple languages support
I have a tool that generates special characters using Unicode for exampleJosef Templ wrote:Doug, you enter Cyrillic or Kanji characters by entering the Unicode assigned to them.
All you need is a keyboard (or auxiliary tool) that generates such codes.
There is no font change involved.
It is clear that we have to restrict ourselves to top-to-bottom and left-to-right languages.
For the Cyrillic community this is sufficient as far as I know.
- Josef
αβγδεζηθικλμνξοπρςστυφχψω
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
∂∇·⧠∫∮∏∑±√≤≥≠≡≈∞ℏ⊥∩∪∀∃†
←→↑↓
but that entails clicking on a button for each symbol desired.
You are saying that a hardware fix (special keyboard) is needed.
What does Ivan do in order to post English text? Does Ivan have a Cyrillic keyboard?
Does Ivan have to switch back and forth between two different keyboards or is there
a software switch that allows him to use one keyboard that can generate both language
codes (code points)?
I thought I read on Wikipedia that it was up to software to determine what a Unicode looks like,
how a code point is rendered. "Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data."
It still is not clear to me that specifying a Unicode code point will display anything without a further mapping between that code point and a glyph. I also assume that the mapping must be a function of the font being used so that the letter A can also appear under, say, MarriageScript.
Hence my current view is that Unicode is not sufficient (necessary but not sufficient) to display a desired character.
-Doug
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Tool that generates special characters using Unicode
Doug, for Russian language I simply changing keyboard layout by Alt+Shift.
Russian keyboard
For Greek characters in my Ubuntu generally I am using the tool .XCompose, but it does not work in BlackBox so your tool can help. To be honest I do not need often this tool, only in rare cases for making documentation. The code from my point of view should be written in Latin-1.
Russian keyboard
For Greek characters in my Ubuntu generally I am using the tool .XCompose, but it does not work in BlackBox so your tool can help. To be honest I do not need often this tool, only in rare cases for making documentation. The code from my point of view should be written in Latin-1.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
I frequently write physics programs where the variables are best represented
by their Greek names. To this point I simply change the font to "Symbol" which
works just fine for me. Josef says to be cautious about doing that. I am not sure
why since I have had no problems. When we have Unicode support I will use my
Pallet tool to enter the Greek symbols directly. By the way I have defined Ctrl-G
to change the font to Symbol which I think of as Greek.
Oh, I like your keyboard!
by their Greek names. To this point I simply change the font to "Symbol" which
works just fine for me. Josef says to be cautious about doing that. I am not sure
why since I have had no problems. When we have Unicode support I will use my
Pallet tool to enter the Greek symbols directly. By the way I have defined Ctrl-G
to change the font to Symbol which I think of as Greek.
Oh, I like your keyboard!
- Josef Templ
- Posts: 2047
- Joined: Tue Sep 17, 2013 6:50 am
Re: Tool that generates special characters using Unicode
Doug, in order to see what you get when using the Symbol font
simply change the font to Arial or whatever you use for the mormal program text.
This is what the compiler sees. The compiler does not look at the
visual appearance of the character glyphs. It only looks at the character codes
and assumes they refer to the ISO Latin-1 character set.
- Josef
simply change the font to Arial or whatever you use for the mormal program text.
This is what the compiler sees. The compiler does not look at the
visual appearance of the character glyphs. It only looks at the character codes
and assumes they refer to the ISO Latin-1 character set.
- Josef
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
OK, I usually start with the default font, highlight the identifier, and type Ctrl-G to change it to "Greek".Josef Templ wrote:Doug, in order to see what you get when using the Symbol font
simply change the font to Arial or whatever you use for the mormal program text.
This is what the compiler sees. The compiler does not look at the
visual appearance of the character glyphs. It only looks at the character codes
and assumes they refer to the ISO Latin-1 character set.
- Josef
So I believe that is why I never have any problems with the Symbol font. I always work within the ISO Latin-1 character set.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
Editor considerations
Since we use the default BB editor to create Component Pascal programs and (I assume) that CHAR (16 bits) is the internal representation of a character that is processed by the compiler I ask "what about 3 byte UTF-8" character handling by the editor?
"East Asian legacy encodings generally used two bytes per character yet take three bytes per character in UTF-8."
So, here I am, sitting with my specially designed Chinese keyboard that generates 3 bytes for every keystroke. How does the editor handle that?
Since we use the default BB editor to create Component Pascal programs and (I assume) that CHAR (16 bits) is the internal representation of a character that is processed by the compiler I ask "what about 3 byte UTF-8" character handling by the editor?
"East Asian legacy encodings generally used two bytes per character yet take three bytes per character in UTF-8."
So, here I am, sitting with my specially designed Chinese keyboard that generates 3 bytes for every keystroke. How does the editor handle that?
Re: Tool that generates special characters using Unicode
You should generate 16 bit Unicode with your designed Chinese keyboard and not Utf-8 code. All documents for compilation are written in 16 bit Unicode.DGDanforth wrote:Editor considerations
So, here I am, sitting with my specially designed Chinese keyboard that generates 3 bytes for every keystroke. How does the editor handle that?
BlackBox 1.6 internally translates the 16 bit Unicode into 8 bit ASCII code during the compilation.
Blackbox 1.7 CPC Edition internally translates the 16 bit Unicode to Utf8 code during the compilation.
I hope that answered your questions.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Tool that generates special characters using Unicode
But there are more Unicode values than can be encoded with 16 bits (65,536).
"The Unicode Standard, the latest version of Unicode contains a repertoire of more than 110,000"
So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?
"The Unicode Standard, the latest version of Unicode contains a repertoire of more than 110,000"
So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?