issue-#19 showing unicode identifiers with show sources

User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: issue-#19 showing unicode identifiers with show sources

Post by DGDanforth »

If I scan the Master mod files for the string "A" I find 23 files. Many of those files are not relevant to Unicode such as GUID or number conversion or exist within comments. But for completeness I list all of them here.

Location
c:\Program Files\BlackBox Master/Dev/Mod/Browser.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/ComDebug.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/CPE.odc
"A" [4]
c:\Program Files\BlackBox Master/Dev/Mod/CPM.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/CPS.odc
"A" [4]
c:\Program Files\BlackBox Master/Dev/Mod/CPT.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/Profiler.odc
"A" [2]
c:\Program Files\BlackBox Master/Dev/Mod/References.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Dialog.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Files.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Menus.odc
"A" [13]
c:\Program Files\BlackBox Master/Host/Mod/Registry.odc
"A" [2]
c:\Program Files\BlackBox Master/Host/Mod/TextConv.odc
"A" [4]
c:\Program Files\BlackBox Master/Obx/Mod/BlackBox.odc
"A" [1]
c:\Program Files\BlackBox Master/Obx/Mod/Excel.odc
"A" [1]
c:\Program Files\BlackBox Master/Obx/Mod/WordEdit.odc
"A" [1]
c:\Program Files\BlackBox Master/Std/Mod/Coder.odc
"A" [1]
c:\Program Files\BlackBox Master/Std/Mod/Interpreter.odc
"A" [1]
c:\Program Files\BlackBox Master/System/Mod/Kernel.odc
"A" [3]
c:\Program Files\BlackBox Master/System/Mod/Printing.odc
"A" [1]
c:\Program Files\BlackBox Master/System/Mod/Strings.odc
"A" [20]
c:\Program Files\BlackBox Master/Text/Mod/Mappers.odc
"A" [1]
c:\Program Files\BlackBox Master/Text/Mod/Setters.odc
"A" [1]
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#19 showing unicode identifiers with show sources

Post by Josef Templ »

which of these modules need to be changed AFTER looking into the details?
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: issue-#19 showing unicode identifiers with show sources

Post by DGDanforth »

Josef Templ wrote:which of these modules need to be changed AFTER looking into the details?
OK, you forced me into making decisions and you will need to suffer the consequences of that :o).
I have narrowed the list down to 15 files:

Location
Dev/Mod/Profiler.odc
"A" [2]
Dev/Mod/References.odc
"A" [1]
Host/Mod/Dialog.odc
"A" [1]
Host/Mod/Menus.odc
"A" [13]
Host/Mod/Registry.odc
"A" [2]
Host/Mod/TextConv.odc
"A" [4]
Obx/Mod/BlackBox.odc
"A" [1]
Obx/Mod/Excel.odc
"A" [1]
Std/Mod/Coder.odc
"A" [1]
System/Mod/Printing.odc
"A" [1]

Arabic numbers?
For a counter example to Arabic see https://en.wikipedia.org/wiki/Chinese_numerals

System/Mod/Kernel.odc
"A" [3]
Dev/Mod/CPS.odc
"A" [5]
System/Mod/Strings.odc
"A" [20]
Text/Mod/Mappers.odc
"A" [1]
Text/Mod/Setters.odc
"A" [1]
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#19 showing unicode identifiers with show sources

Post by Josef Templ »

First of all you should search for "Z", not "A".
Then you skip all Hex conversions automatically.

Then search for "z" and the files that contain both "Z" and "z" are candidates
for a detail inspection. I would expect only 3 or 4 to remain.

- Josef
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: issue-#19 showing unicode identifiers with show sources

Post by DGDanforth »

Josef Templ wrote:First of all you should search for "Z", not "A".
Then you skip all Hex conversions automatically.

Then search for "z" and the files that contain both "Z" and "z" are candidates
for a detail inspection. I would expect only 3 or 4 to remain.

- Josef
Josef,
That is easy to do with GftSearchFiles where one can simultaneously search for both "Z" and "z".

Below is my annotated search results. I would like feedback on my comments, please.

Location
Dev/Mod/CPS.odc
"Z" [1]
"z" [1]
Get ... | "G".."H", "J", "K", "Q", "S", "X".."Z", "a".."z", "_": Identifier(s)
Identifiers ... Strings.IsIdent(ch)
PROCEDURE IsIdent* (ch: CHAR): BOOLEAN;
BEGIN
(* returns IsIdentStart(ch) OR IsNumeric(ch); optimized because heavily used in the compiler *)
CASE ch OF
"a".."z", "A".."Z", "_", "0".."9": RETURN TRUE
ELSE
IF ch > 7FX THEN RETURN Kernel.IsAlpha(ch) ELSE RETURN FALSE END
END
END IsIdent;
This is not using the UTF-8 encoding of identifiers.

Dev/Mod/References.odc
"Z" [1]
"z" [1]
PROCEDURE SearchIdent ...
WHILE ~r.eot & ((ch < "A") OR (ch > "Z") & (ch # "_") & (ch < "a") OR (ch > "z")) DO
INC(beg); r.ReadChar(ch)
END;
This is not using the UTF-8 encoding of identifiers.

Host/Mod/Menus.odc
"Z" [5]
"z" [2]
PROCEDURE SetShortcut ...
Should shortcuts be localizable?
PROCEDURE NextWord ...
What is the definition of a word in an arbitrary language?
PROCEDURE GetHotkey ...
This looks OK to limit hotheys to "A" to "B" BUT that would also depend on the keyboard being used.
PROCEDURE FirstFree ...
Same issue as with NextWord
PROCEDURE SetHotkeys ...
Same issue as with GetHotkey

Host/Mod/TextConv.odc
"Z" [2]
"z" [3]
PROCEDURE ParseRichText...
RichText is conceivable language localized (e.g. Russian) and hence parsing needs to be done for each language.
So rather than operating on a CHAR I believe parsing needs to operate on Unicode "code points", i.e. UTF-8.
I am assuming that BlackBox documents support language localization.

Std/Mod/Coder.odc
"Z" [1]
"z" [1]
PROCEDURE InitCodes ...
This maybe OK if StdCoder is operating only on bytes.

System/Mod/Kernel.odc
"Z" [2]
"z" [2]
PROCEDURE Upper...
Is the concept of character "case" universal to all languages? Probably not so I assume these calls are included
for those languages, e.g. Latin-1, that have case.

System/Mod/Strings.odc
"Z" [2]
"z" [1]
PROCEDURE IsIdent...
Again this is not Unicode compliant.

-Doug
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#19 showing unicode identifiers with show sources

Post by Josef Templ »

Doug, thanks for the list.
In general, this is not related to UTF8-encoding.
It is related to using Unicode.

Here are my comments:

Dev/Mod/CPS.odc
No need to change.

Dev/Mod/References.odc
Already fixed.

Host/Mod/Menus.odc
We already know that this may need to be changed but we don't know yet HOW.

Host/Mod/TextConv.odc
No need to change. The RTF commands use only ASCII.

Std/Mod/Coder.odc
No need to change.

System/Mod/Kernel.odc
No need to change. See the ELSIF part.

System/Mod/Strings.odc
No need to change. See the ELSE part.

In addition, DevProfiler, needs to be changed.
It doesn't use "z" but CAP(ch) > "Z". So searching for "z" & "Z" is not enough.
It may also be a good idea to try searching for "A" &"Z".

To summarize, the modules found so far are:
HostMenus
DevProfiler

For DevProfiler I have provided a fix.
It is of marginal importance, however, because this code is not used anywhere in BlackBox.

As Strings.IsIdentStart and Strings.IsAlpha are quite heavily used by now
I have optimized the implementation. Instead of delegating all calls to Kernel.IsAlpha,
the ASCII case is handled explicitly. This is much faster because Kernel.IsAlpha does
a Windows system call that imposes some overhead (in addition to its own execution).

For the changes see http://redmine.blackboxframework.org/pr ... 965e433087.

- Josef
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: issue-#19 showing unicode identifiers with show sources

Post by DGDanforth »

Searching for "A" and "Z" turned up no new files.
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#19 showing unicode identifiers with show sources

Post by Josef Templ »

Thanks, Doug.
We must not forget about the CAP in DevSearch which I mentioned in Issue-#88.
It is not strictly related to issue-#19 but rather issue-#37, which fixed a number of bugs in DevSearch.

- Josef
Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: issue-#19 showing unicode identifiers with show sources

Post by Ivan Denisov »

I think, that this issue is ready for voting.

The changes:
http://redmine.blackboxframework.org/pr ... d4a5a7ffc8
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: issue-#19 showing unicode identifiers with show sources

Post by DGDanforth »

Ivan Denisov wrote:I think, that this issue is ready for voting.

The changes:
http://redmine.blackboxframework.org/pr ... d4a5a7ffc8
In version 1.7-rc1 Build 571 there is no procedure Kernel.IsAlpha within the Kernel.odc file but that procedure does appear in the interface
(Ctrl-D of Kernel). That says to me the source is out of step with the interface.
Post Reply