Page 1 of 2

issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 09, 2019 3:04 pm
by Josef Templ
I have created an issue for the proposal of Helmut, see https://redmine.blackboxframework.org/issues/201.

One detail question is:
What is the effect of specifying {Converters.importAll}? I cannot see any difference in the behavior.

- Josef

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 09, 2019 5:09 pm
by luowy
Josef Templ wrote:What is the effect of specifying {Converters.importAll}? I cannot see any difference in the behavior.
Read the code of Converters.Import .

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 09, 2019 8:03 pm
by Josef Templ
OK, in the file open dialog there is no difference because this flag is not used outside Converters.

Inside Converters it is only used for getting a converter for the rare case that there is no
converter specified and no converter registered for the file extension.
Then the very first converter that has this option is chosen.

As such, does it give sense to have multiple converters marked as importAll?
Only the very first in the list is ever taken.

I am asking this because I want a clear picture of possible side effects from the proposed change in the module Config.

- Josef

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Sat Aug 10, 2019 3:45 am
by luowy
the first rigister one with {importAll} is valid, so the patch for Config should put in top of the "Setup" to take effect.


please check this code in the Config file:
Code: Select all
      Converters.Register("DevBrowser.ImportSymFile", "", "TextViews.View", "osf", {});
      Converters.Register("DevBrowser.ImportCodeFile", "", "TextViews.View", "ocf", {});

The "TextViews.View" should changed to "Documents.Document", I think;

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Sat Aug 10, 2019 8:46 pm
by Josef Templ
> The "TextViews.View" should changed to "Documents.Document", I think;

Documents.Document is, ironically, undocumented.
How can it be used if it does not exist officially?
In addition, since TextViews.View works well, why should we change that?
Furthermore, why only for ImportSymFile and ImportCodeFile?

- Josef

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Thu Aug 15, 2019 8:46 am
by Josef Templ
A first draft version is in the branch.
For diffs see https://redmine.blackboxframework.org/projects/blackbox/repository/diff?utf8=%E2%9C%93&rev=2d2cb269c073b96429fd823701d764dcf5bb0d83&rev_to=cd3c020e77648cc9b37ad4cea7677cec172f4dc6

It is largely based on Helmut's proposal with some refinements in ImportUtf8:

- if the optional byte order mark (BOM) is found at the beginning of the file, it is skipped.
This is because the BlackBox text editor gets confused when the BOM is imported.

- characters larger than 16 bit are reported as '?'.
This is expected to be very rare but has been added for the sake of completeness.

- in case of finding an illegal encoding it falls back to importing a windows text.
This is an experimental feature that would allow us to use the Utf8 importer also for xml or html texts, I think.
html and xml is often encoded in utf8 but sometimes it is not directly known if it is or not.
Please think about this and give feedback if this is a good idea or not.

- Josef

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 16, 2019 6:16 am
by Zinn
Thank you Josef. Your improvement is a perfect solution. On error falling back to text format is a great idea. So I have not to open the same file once again when it is not UTF-8.
- Helmut

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 16, 2019 2:59 pm
by luowy
ExportUtf8, how about skip views?
Code: Select all
  IF (ch # view) & (ch # para) THEN
    IF ch = CR THEN WriteChar(wr, LF) ELSE WriteChar(wr, ch) END;
  END;

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Fri Aug 16, 2019 6:51 pm
by Josef Templ
Good point. Should be included. Have to look how this is handled in other text importers/exporters.

- Josef

Re: issue-#201 Importer and Exporter for UTF-8 texts

PostPosted: Sat Aug 17, 2019 7:15 pm
by Josef Templ
ExportText also treats TextModels.viewcode and TextModels.para specially, i.e. they are skipped.
But why is TextModels.para skipped? Shouldn't it better be treated as a newline character, or even two?

- Josef