cfbsoftware wrote:I am finding it increasingly difficult to follow exactly what is going on here. A change as fundamental and as significant as this requires a more analytical / scientific planned approach than appears to be happening. The requirements need to be firmly established and a technical design document that covers all ramifications of the changes is needed so that a proposed solution can be agreed on. Then, and only then, should any significant development be started.
[/url]
Here is a summary of this discussion:
The issue started from the request to support national character sets in Component Pascal identifiers.
This request comes from the cyrillic user community where BB is used for education purposes.
In BB 1.6 it is possible to use extended ASCII characters in identifiers only for the ISO Latin-1 character set.
A first attempt (by Helmut) was made to generalize this to the 'current' character set installed on a machine.
Thus, the interpretation of extended characters depends on the standard character set of the machine.
This is simple and proofed to work. There is no change in any data structures or file formats.
The drawback of this approach is that it requires code page support and it creates files
that are not portable across different code pages if extended ASCII characters are used.
In a first design a new module 'Characters' has been used. Later it has been merged into module 'Strings' and Kernel.
A second attempt (by Helmut) has been made to support full Unicode in identifiers very much like it has been
done by Ominc for string constants. Identifiers are stored in symbol and object files as
Utf-8 encoded SHORTCHARs. For reasons of simplicity, also the compiler uses Utf-8 encoded SHORTCHARs
for representing identifiers in memory. This also proofed to work with a small exception: the
detection of the identifier length limit, but this can be fixed easily.
The advantage of this approach is that it avoids using code pages and it creates fully portable files.
If only ASCII characters are used, there is no change in the symbol or object file format.
The obvious alternative to the second attempt is to use CHARs instead of SHORTCHARs in the compiler
for representing identifiers. Since this also applies to parts of the runtime system that use the
data type Kernel.Name, it looks simpler to stay with Helmut's solution.
Switching from SHORTCHARs to CHARs internally can be done later if there is any need for it.
- Josef