issue-#9: adding module Characters → #19

Merged to the master branch
cfbsoftware
Posts: 204
Joined: Wed Sep 18, 2013 10:06 pm
Contact:

Re: Feature #9: adding module Characters

Post by cfbsoftware »

DGDanforth wrote:SIZE cannot be used in constant expressions because its value depends on the actual compiler implementation.
That no longer seems to apply. A program which includes the follow code snippet compiles and runs on 1.6 Final:

Code: Select all

TYPE
  Rec = RECORD a: INTEGER; b: INTEGER;
        chars: ARRAY 10 OF CHAR
      END;

CONST 
  c =  10 + SIZE(Rec);
or have I misinterpreted the rule?
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: Feature #9: adding module Characters

Post by Josef Templ »

cfbsoftware wrote:I am finding it increasingly difficult to follow exactly what is going on here. A change as fundamental and as significant as this requires a more analytical / scientific planned approach than appears to be happening. The requirements need to be firmly established and a technical design document that covers all ramifications of the changes is needed so that a proposed solution can be agreed on. Then, and only then, should any significant development be started.
[/url]
Here is a summary of this discussion:

The issue started from the request to support national character sets in Component Pascal identifiers.
This request comes from the cyrillic user community where BB is used for education purposes.

In BB 1.6 it is possible to use extended ASCII characters in identifiers only for the ISO Latin-1 character set.
A first attempt (by Helmut) was made to generalize this to the 'current' character set installed on a machine.
Thus, the interpretation of extended characters depends on the standard character set of the machine.
This is simple and proofed to work. There is no change in any data structures or file formats.
The drawback of this approach is that it requires code page support and it creates files
that are not portable across different code pages if extended ASCII characters are used.
In a first design a new module 'Characters' has been used. Later it has been merged into module 'Strings' and Kernel.

A second attempt (by Helmut) has been made to support full Unicode in identifiers very much like it has been
done by Ominc for string constants. Identifiers are stored in symbol and object files as
Utf-8 encoded SHORTCHARs. For reasons of simplicity, also the compiler uses Utf-8 encoded SHORTCHARs
for representing identifiers in memory. This also proofed to work with a small exception: the
detection of the identifier length limit, but this can be fixed easily.
The advantage of this approach is that it avoids using code pages and it creates fully portable files.
If only ASCII characters are used, there is no change in the symbol or object file format.

The obvious alternative to the second attempt is to use CHARs instead of SHORTCHARs in the compiler
for representing identifiers. Since this also applies to parts of the runtime system that use the
data type Kernel.Name, it looks simpler to stay with Helmut's solution.
Switching from SHORTCHARs to CHARs internally can be done later if there is any need for it.

- Josef
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Feature #9: adding module Characters

Post by DGDanforth »

Would someone please create a redmine diff of Helmut's modules that use UTF-8 encoding vs BB1.6?
-Doug
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Feature #9: adding module Characters

Post by DGDanforth »

We need to move forward on this issue.
I for one can not now vote for it because I do not see all of the ramifications.
I formally request that Helmult (with Ivan's help) put together a report specifying
exactly where BB1.6 code is modified in order to support utf-8 by the compiler.

Helmut, do you agree to generate that report (with diffs)?
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: Feature #9: adding module Characters

Post by Josef Templ »

The process should be to start with an issue first. This gives us an issue number to refer to.
If there is a common agreement that we take Helmut's latest approach
the issue may be something like
"Adding support for Unicode characters in Component Pascal identifiers".

The historical steps of introducing the module Characters and code page
support etc. will not show up in this issue and will not be in the repository.

The discussion should be moved to a new topic named after the new issue number.

A topic branch will be created and the source changes will be visible within this
topic branch.

Again, please note that this is a complex topic in the sense that it affects a lot of files.
So we cannot do any other topics in parallel for some time without getting into the
danger of merge conflicts.

You may have noticed that this style of development has also been applied to the
already closed issues. With the final step of voting about inclusion to master,
there is always the following CANONICAL ISSUE QUADRUPLE
(redmine issue, phpBB issue discussion, GitHub topic branch, phpBB issue voting)
for any issue. The issue number bundles the related pieces.

- Josef
Post Reply