Page 2 of 3

Re: issue-#19 showing unicode identifiers with show sources

Posted: Mon Jul 04, 2016 10:32 pm
by DGDanforth
If I scan the Master mod files for the string "A" I find 23 files. Many of those files are not relevant to Unicode such as GUID or number conversion or exist within comments. But for completeness I list all of them here.

Location
c:\Program Files\BlackBox Master/Dev/Mod/Browser.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/ComDebug.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/CPE.odc
"A" [4]
c:\Program Files\BlackBox Master/Dev/Mod/CPM.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/CPS.odc
"A" [4]
c:\Program Files\BlackBox Master/Dev/Mod/CPT.odc
"A" [1]
c:\Program Files\BlackBox Master/Dev/Mod/Profiler.odc
"A" [2]
c:\Program Files\BlackBox Master/Dev/Mod/References.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Dialog.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Files.odc
"A" [1]
c:\Program Files\BlackBox Master/Host/Mod/Menus.odc
"A" [13]
c:\Program Files\BlackBox Master/Host/Mod/Registry.odc
"A" [2]
c:\Program Files\BlackBox Master/Host/Mod/TextConv.odc
"A" [4]
c:\Program Files\BlackBox Master/Obx/Mod/BlackBox.odc
"A" [1]
c:\Program Files\BlackBox Master/Obx/Mod/Excel.odc
"A" [1]
c:\Program Files\BlackBox Master/Obx/Mod/WordEdit.odc
"A" [1]
c:\Program Files\BlackBox Master/Std/Mod/Coder.odc
"A" [1]
c:\Program Files\BlackBox Master/Std/Mod/Interpreter.odc
"A" [1]
c:\Program Files\BlackBox Master/System/Mod/Kernel.odc
"A" [3]
c:\Program Files\BlackBox Master/System/Mod/Printing.odc
"A" [1]
c:\Program Files\BlackBox Master/System/Mod/Strings.odc
"A" [20]
c:\Program Files\BlackBox Master/Text/Mod/Mappers.odc
"A" [1]
c:\Program Files\BlackBox Master/Text/Mod/Setters.odc
"A" [1]

Re: issue-#19 showing unicode identifiers with show sources

Posted: Tue Jul 05, 2016 5:49 am
by Josef Templ
which of these modules need to be changed AFTER looking into the details?

Re: issue-#19 showing unicode identifiers with show sources

Posted: Wed Jul 06, 2016 6:14 am
by DGDanforth
Josef Templ wrote:which of these modules need to be changed AFTER looking into the details?
OK, you forced me into making decisions and you will need to suffer the consequences of that :o).
I have narrowed the list down to 15 files:

Location
Dev/Mod/Profiler.odc
"A" [2]
Dev/Mod/References.odc
"A" [1]
Host/Mod/Dialog.odc
"A" [1]
Host/Mod/Menus.odc
"A" [13]
Host/Mod/Registry.odc
"A" [2]
Host/Mod/TextConv.odc
"A" [4]
Obx/Mod/BlackBox.odc
"A" [1]
Obx/Mod/Excel.odc
"A" [1]
Std/Mod/Coder.odc
"A" [1]
System/Mod/Printing.odc
"A" [1]

Arabic numbers?
For a counter example to Arabic see https://en.wikipedia.org/wiki/Chinese_numerals

System/Mod/Kernel.odc
"A" [3]
Dev/Mod/CPS.odc
"A" [5]
System/Mod/Strings.odc
"A" [20]
Text/Mod/Mappers.odc
"A" [1]
Text/Mod/Setters.odc
"A" [1]

Re: issue-#19 showing unicode identifiers with show sources

Posted: Thu Jul 07, 2016 12:19 pm
by Josef Templ
First of all you should search for "Z", not "A".
Then you skip all Hex conversions automatically.

Then search for "z" and the files that contain both "Z" and "z" are candidates
for a detail inspection. I would expect only 3 or 4 to remain.

- Josef

Re: issue-#19 showing unicode identifiers with show sources

Posted: Fri Jul 08, 2016 1:31 am
by DGDanforth
Josef Templ wrote:First of all you should search for "Z", not "A".
Then you skip all Hex conversions automatically.

Then search for "z" and the files that contain both "Z" and "z" are candidates
for a detail inspection. I would expect only 3 or 4 to remain.

- Josef
Josef,
That is easy to do with GftSearchFiles where one can simultaneously search for both "Z" and "z".

Below is my annotated search results. I would like feedback on my comments, please.

Location
Dev/Mod/CPS.odc
"Z" [1]
"z" [1]
Get ... | "G".."H", "J", "K", "Q", "S", "X".."Z", "a".."z", "_": Identifier(s)
Identifiers ... Strings.IsIdent(ch)
PROCEDURE IsIdent* (ch: CHAR): BOOLEAN;
BEGIN
(* returns IsIdentStart(ch) OR IsNumeric(ch); optimized because heavily used in the compiler *)
CASE ch OF
"a".."z", "A".."Z", "_", "0".."9": RETURN TRUE
ELSE
IF ch > 7FX THEN RETURN Kernel.IsAlpha(ch) ELSE RETURN FALSE END
END
END IsIdent;
This is not using the UTF-8 encoding of identifiers.

Dev/Mod/References.odc
"Z" [1]
"z" [1]
PROCEDURE SearchIdent ...
WHILE ~r.eot & ((ch < "A") OR (ch > "Z") & (ch # "_") & (ch < "a") OR (ch > "z")) DO
INC(beg); r.ReadChar(ch)
END;
This is not using the UTF-8 encoding of identifiers.

Host/Mod/Menus.odc
"Z" [5]
"z" [2]
PROCEDURE SetShortcut ...
Should shortcuts be localizable?
PROCEDURE NextWord ...
What is the definition of a word in an arbitrary language?
PROCEDURE GetHotkey ...
This looks OK to limit hotheys to "A" to "B" BUT that would also depend on the keyboard being used.
PROCEDURE FirstFree ...
Same issue as with NextWord
PROCEDURE SetHotkeys ...
Same issue as with GetHotkey

Host/Mod/TextConv.odc
"Z" [2]
"z" [3]
PROCEDURE ParseRichText...
RichText is conceivable language localized (e.g. Russian) and hence parsing needs to be done for each language.
So rather than operating on a CHAR I believe parsing needs to operate on Unicode "code points", i.e. UTF-8.
I am assuming that BlackBox documents support language localization.

Std/Mod/Coder.odc
"Z" [1]
"z" [1]
PROCEDURE InitCodes ...
This maybe OK if StdCoder is operating only on bytes.

System/Mod/Kernel.odc
"Z" [2]
"z" [2]
PROCEDURE Upper...
Is the concept of character "case" universal to all languages? Probably not so I assume these calls are included
for those languages, e.g. Latin-1, that have case.

System/Mod/Strings.odc
"Z" [2]
"z" [1]
PROCEDURE IsIdent...
Again this is not Unicode compliant.

-Doug

Re: issue-#19 showing unicode identifiers with show sources

Posted: Fri Jul 08, 2016 4:57 am
by Josef Templ
Doug, thanks for the list.
In general, this is not related to UTF8-encoding.
It is related to using Unicode.

Here are my comments:

Dev/Mod/CPS.odc
No need to change.

Dev/Mod/References.odc
Already fixed.

Host/Mod/Menus.odc
We already know that this may need to be changed but we don't know yet HOW.

Host/Mod/TextConv.odc
No need to change. The RTF commands use only ASCII.

Std/Mod/Coder.odc
No need to change.

System/Mod/Kernel.odc
No need to change. See the ELSIF part.

System/Mod/Strings.odc
No need to change. See the ELSE part.

In addition, DevProfiler, needs to be changed.
It doesn't use "z" but CAP(ch) > "Z". So searching for "z" & "Z" is not enough.
It may also be a good idea to try searching for "A" &"Z".

To summarize, the modules found so far are:
HostMenus
DevProfiler

For DevProfiler I have provided a fix.
It is of marginal importance, however, because this code is not used anywhere in BlackBox.

As Strings.IsIdentStart and Strings.IsAlpha are quite heavily used by now
I have optimized the implementation. Instead of delegating all calls to Kernel.IsAlpha,
the ASCII case is handled explicitly. This is much faster because Kernel.IsAlpha does
a Windows system call that imposes some overhead (in addition to its own execution).

For the changes see http://redmine.blackboxframework.org/pr ... 965e433087.

- Josef

Re: issue-#19 showing unicode identifiers with show sources

Posted: Fri Jul 08, 2016 11:19 am
by DGDanforth
Searching for "A" and "Z" turned up no new files.

Re: issue-#19 showing unicode identifiers with show sources

Posted: Sat Jul 09, 2016 4:59 pm
by Josef Templ
Thanks, Doug.
We must not forget about the CAP in DevSearch which I mentioned in Issue-#88.
It is not strictly related to issue-#19 but rather issue-#37, which fixed a number of bugs in DevSearch.

- Josef

Re: issue-#19 showing unicode identifiers with show sources

Posted: Tue Jul 12, 2016 5:39 pm
by Ivan Denisov
I think, that this issue is ready for voting.

The changes:
http://redmine.blackboxframework.org/pr ... d4a5a7ffc8

Re: issue-#19 showing unicode identifiers with show sources

Posted: Wed Jul 13, 2016 9:29 pm
by DGDanforth
Ivan Denisov wrote:I think, that this issue is ready for voting.

The changes:
http://redmine.blackboxframework.org/pr ... d4a5a7ffc8
In version 1.7-rc1 Build 571 there is no procedure Kernel.IsAlpha within the Kernel.odc file but that procedure does appear in the interface
(Ctrl-D of Kernel). That says to me the source is out of step with the interface.