conversion of identifiers between Unicode and Utf8

Josef Templ · Post by **Josef Templ** » Tue Jun 09, 2015 7:27 am

This voting is about the question if conversion of identifiers between Unicode and Utf8
should be treated separately from conversion of other strings between Unicode and Utf8.

The background of this question is the discussion in issue-#57 (DevBrowser fixes)
where identifiers could be converted between Unicode and Utf8 by using Strings functions or
by using Kernel functions.

This voting is related to the previous voting (usage of undocumented low-level functions)
where it has been agreed that undocumented low-level functions should only be used
if there are strong arguments for justifying its usage. Now the question is if identifier
conversion is such an exception or if it is just the normal and intended usage of Strings.

Gentlemen please cast your vote and feel free to add a comment to your vote.

- Josef

DGDanforth · Post by **DGDanforth** » Tue Jun 09, 2015 7:41 am

Again, in my opinion, text is text and should be treated uniformly. So wherever and however conversion is performed it should be done in the same way for source code and documents.

Josef Templ · Post by **Josef Templ** » Tue Jun 09, 2015 7:45 am

In general, there is no reason to treat identifiers in a separate way.
In fact, the Utf8 conversion functions have been added to Strings for exactly this purpose.
Ivan's argument (I hope I understood it correctly) that it is an advantage to be able to exchange the
Utf8 conversion in Strings with some other form of conversion without affecting the Utf8 conversion of identifiers
in the compiler and other development tools is very strange. It would not be Utf8 conversion
any longer but still be named Utf8ToString and StringToUtf8.
So why the hack should one want to do that? It is close to shooting yourself into the knee.

The only exception I have found so far is in HostPackedFiles. But this is really an exception,
not the general case. See http://forum.blackboxframework.org/view ... =250#p2236
for the details.

- Josef

Josef Templ · Post by **Josef Templ** » Tue Jun 09, 2015 8:06 am

I should have added that in HostPackedFiles it is not about Utf8 conversion but about
the Upper function. But the question is the same. Should it be Kernel.Upper or Strings.Upper?
Because HostPackedFiles has a hidden and subtle connection with
DevDependencies the exceptional usage of Kernel.Upper seems to be justified to me.

- Josef

Josef Templ · Post by **Josef Templ** » Wed Jun 10, 2015 7:21 am

Those who voted for the second option (Ivan, Helmut) can you please explain WHY
you think that identifiers are different from normal strings regarding Utf8 conversion?

What is the strong reason for treating them differently?
If there is no strong reason, your voting is inconsistent with the previous voting
about "usage of undocumented low-level functions".

Please explain.

- Josef

Ivan Denisov · Post by **Ivan Denisov** » Wed Jun 10, 2015 10:32 am

I accepted this simple current conversion method only for identifiers.
Strings module should be changed as proposed by LuoWe.
http://forum.blackboxframework.org/view ... =253#p2274
And such possibility of Strings exchange is serious reason for using Kernel.
Argument about existing documentation benefits for this case is groundless.
The second argument is probability of decreased performance.
Are you sure, that the wrapping does not influence on the performance?

Josef Templ · Post by **Josef Templ** » Wed Jun 10, 2015 11:10 am

Ivan Denisov wrote: Are you sure, that the wrapping does not influence on the performance?

I am sure that it influences the performance in the range of 1 or 2 nanoseconds per call.
This may sum up to 1 or 2 microseconds per module and to less then 1 milliseconds
per rebuild of BB. With the standard Windows 10 ms timer you will not be able to measure
the difference and you will certainly not be able to feel the difference.
I am addicted to speed at least as much as any other center member and therefore
I would not propose anything that slows down the system.
Please note that there is a loop inside the Utf8 conversion where most of the time is spent
and the parameters are all passed by reference. So there is no copying going on in the call.

Regarding luowy's 4 byte Utf-8 converter: we don't need it because we only have 2 byte CHARs
and even Java and C# don't have anything else. If we need a 4 byte Utf8 converter we can
always add it to Strings later. This is not written in stone. Everybody can also do his
own application specific Utf8-converter in his own module. There is more than Strings.

- Josef

Ivan Denisov · Post by **Ivan Denisov** » Wed Jun 10, 2015 11:54 am

Josef, if we not changing it now, it can be changed in future. Current method is not checking bad-formed Utf8.

About performance. It is not big difference, but it is exists and can be measured (test attached).

Code: Select all

Strings:
70.5 ms
14.0 ms

Kernel:
68.0 ms
12.3 ms

However the performance is not main issue in this problem.
I was understood this, because of the current mess in #issue-19, in some places there are Kernel, in some there are Strings.
The overloading of Strings by this compiler stuff has few negative sides (further to performance):
1. Fixation of conversion method in exported* module (impossible to change Strings in future without risks of breaking compatibility)
2. In total more imports because of Strings (in some modules Strings imported only for this conversion procedures)

* - documented and suggested for users

Josef Templ · Post by **Josef Templ** » Wed Jun 10, 2015 1:08 pm

Ivan Denisov wrote:About performance. It is not big difference, but it is exists and can be measured (test attached).

thanks, the runtime overhead per call is even below 1 nanosecond on your machine.
So you are well equipped.
When I wrote about the ability to measure it, I was referring to a single rebuild of BB,
of course, not about a timing loop.

Ivan Denisov wrote: 1. Fixation of conversion method in exported* module (impossible to change Strings in future without risks of breaking compatibility)

You should not even think about exchanging this in an incompatible way. Its a waste of time.

Ivan Denisov wrote: 2. In total more imports because of Strings (in some modules Strings imported only for this conversion procedures)

This depends on the module. There may be other examples as well (e.g. DevCPP, DevCPS, etc.).
I was always referring to DevBrowser only. Leave the other modules as they are.
We need to move on.

- Josef

Bernhard · Post by **Bernhard** » Wed Jun 10, 2015 1:39 pm

To avoid code duplication, it might be a solution to put the common procedures in yet another module, which is imported by Kernel and Strings. This would not solve the problem that all instances of the link command list must be changed.

BlackBox Framework Center

conversion of identifiers between Unicode and Utf8

Should we use Kernel or Strings for the Utf8 conversion of identifiers

conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8

Re: conversion of identifiers between Unicode and Utf8