Tool that generates special characters using Unicode

DGDanforth · Post by **DGDanforth** » Tue Oct 21, 2014 7:37 pm

Editor
If the goal is to provide the infrastructure needed to support multiple languages then document editing needs to be considered. There are (at least) two aspects.
(1) keyboard mapping
(2) text ordering

I do not know how Russian programmers get their Cyrillic alphabet entered. Is it just a font change?
What about oriental languages? Are there fonts for Kanji, katakana?
If all the languages are handled by fonts then one simply needs to learn the keystrokes that result in the desired image.

What about the order of characters entered? The English convention supported by the BB document editor is left to right, top to bottom→↓.

I believe Arabic uses right to left, top to bottom←↓ and Chinese uses top to bottom, right to left↓←.

The sequence of the characters should be independent of the language so the compiler would not notice.

Such considerations indicate the difficulties in multiple language support.

Josef Templ · Post by **Josef Templ** » Wed Oct 22, 2014 7:36 am

Doug, you enter Cyrillic or Kanji characters by entering the Unicode assigned to them.
All you need is a keyboard (or auxiliary tool) that generates such codes.
There is no font change involved.

It is clear that we have to restrict ourselves to top-to-bottom and left-to-right languages.
For the Cyrillic community this is sufficient as far as I know.

- Josef

DGDanforth · Post by **DGDanforth** » Thu Oct 23, 2014 4:44 am

Josef Templ wrote:Doug, you enter Cyrillic or Kanji characters by entering the Unicode assigned to them.
All you need is a keyboard (or auxiliary tool) that generates such codes.
There is no font change involved.

It is clear that we have to restrict ourselves to top-to-bottom and left-to-right languages.
For the Cyrillic community this is sufficient as far as I know.

- Josef

I have a tool that generates special characters using Unicode for example

αβγδεζηθικλμνξοπρςστυφχψω
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ΢ΣΤΥΦΧΨΩ
∂∇·⧠∫∮∏∑±√≤≥≠≡≈∞ℏ⊥∩∪∀∃†
←→↑↓

but that entails clicking on a button for each symbol desired.

You are saying that a hardware fix (special keyboard) is needed.

What does Ivan do in order to post English text? Does Ivan have a Cyrillic keyboard?
Does Ivan have to switch back and forth between two different keyboards or is there
a software switch that allows him to use one keyboard that can generate both language
codes (code points)?

I thought I read on Wikipedia that it was up to software to determine what a Unicode looks like,
how a code point is rendered. "Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data."

It still is not clear to me that specifying a Unicode code point will display anything without a further mapping between that code point and a glyph. I also assume that the mapping must be a function of the font being used so that the letter A can also appear under, say, MarriageScript.

Hence my current view is that Unicode is not sufficient (necessary but not sufficient) to display a desired character.

-Doug

Ivan Denisov · Post by **Ivan Denisov** » Thu Oct 23, 2014 5:05 am

Doug, for Russian language I simply changing keyboard layout by Alt+Shift.
Russian keyboard

For Greek characters in my Ubuntu generally I am using the tool .XCompose, but it does not work in BlackBox

so your tool can help. To be honest I do not need often this tool, only in rare cases for making documentation. The code from my point of view should be written in Latin-1.

DGDanforth · Post by **DGDanforth** » Fri Oct 24, 2014 7:01 am

I frequently write physics programs where the variables are best represented
by their Greek names. To this point I simply change the font to "Symbol" which
works just fine for me. Josef says to be cautious about doing that. I am not sure
why since I have had no problems. When we have Unicode support I will use my
Pallet tool to enter the Greek symbols directly. By the way I have defined Ctrl-G
to change the font to Symbol which I think of as Greek.

Oh, I like your keyboard!

Josef Templ · Post by **Josef Templ** » Fri Oct 24, 2014 8:35 am

Doug, in order to see what you get when using the Symbol font
simply change the font to Arial or whatever you use for the mormal program text.
This is what the compiler sees. The compiler does not look at the
visual appearance of the character glyphs. It only looks at the character codes
and assumes they refer to the ISO Latin-1 character set.

- Josef

DGDanforth · Post by **DGDanforth** » Sat Oct 25, 2014 5:11 am

Josef Templ wrote:Doug, in order to see what you get when using the Symbol font
simply change the font to Arial or whatever you use for the mormal program text.
This is what the compiler sees. The compiler does not look at the
visual appearance of the character glyphs. It only looks at the character codes
and assumes they refer to the ISO Latin-1 character set.

- Josef

OK, I usually start with the default font, highlight the identifier, and type Ctrl-G to change it to "Greek".
So I believe that is why I never have any problems with the Symbol font. I always work within the ISO Latin-1 character set.

DGDanforth · Post by **DGDanforth** » Sat Oct 25, 2014 5:23 am

Editor considerations

Since we use the default BB editor to create Component Pascal programs and (I assume) that CHAR (16 bits) is the internal representation of a character that is processed by the compiler I ask "what about 3 byte UTF-8" character handling by the editor?

"East Asian legacy encodings generally used two bytes per character yet take three bytes per character in UTF-8."

So, here I am, sitting with my specially designed Chinese keyboard that generates 3 bytes for every keystroke. How does the editor handle that?

Zinn · Post by **Zinn** » Sat Oct 25, 2014 5:33 pm

DGDanforth wrote:Editor considerations
So, here I am, sitting with my specially designed Chinese keyboard that generates 3 bytes for every keystroke. How does the editor handle that?

You should generate 16 bit Unicode with your designed Chinese keyboard and not Utf-8 code. All documents for compilation are written in 16 bit Unicode.

BlackBox 1.6 internally translates the 16 bit Unicode into 8 bit ASCII code during the compilation.
Blackbox 1.7 CPC Edition internally translates the 16 bit Unicode to Utf8 code during the compilation.

I hope that answered your questions.

DGDanforth · Post by **DGDanforth** » Sat Oct 25, 2014 10:42 pm

But there are more Unicode values than can be encoded with 16 bits (65,536).
"The Unicode Standard, the latest version of Unicode contains a repertoire of more than 110,000"

So are we kludging and just hoping that no one will need a Unicode value that exceeds 16 bits?

BlackBox Framework Center

Tool that generates special characters using Unicode

Tool that generates special characters using Unicode

Re: Menus files and multiple languages support

Re: Menus files and multiple languages support

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode

Re: Tool that generates special characters using Unicode