issue-#201 Importer and Exporter for UTF-8 texts

Merged to the master branch
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

I have created an issue for the proposal of Helmut, see https://redmine.blackboxframework.org/issues/201.

One detail question is:
What is the effect of specifying {Converters.importAll}? I cannot see any difference in the behavior.

- Josef
luowy
Posts: 234
Joined: Mon Oct 20, 2014 12:52 pm

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by luowy »

Josef Templ wrote:What is the effect of specifying {Converters.importAll}? I cannot see any difference in the behavior.
Read the code of Converters.Import .
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

OK, in the file open dialog there is no difference because this flag is not used outside Converters.

Inside Converters it is only used for getting a converter for the rare case that there is no
converter specified and no converter registered for the file extension.
Then the very first converter that has this option is chosen.

As such, does it give sense to have multiple converters marked as importAll?
Only the very first in the list is ever taken.

I am asking this because I want a clear picture of possible side effects from the proposed change in the module Config.

- Josef
luowy
Posts: 234
Joined: Mon Oct 20, 2014 12:52 pm

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by luowy »

the first rigister one with {importAll} is valid, so the patch for Config should put in top of the "Setup" to take effect.


please check this code in the Config file:

Code: Select all

		Converters.Register("DevBrowser.ImportSymFile", "", "TextViews.View", "osf", {});
		Converters.Register("DevBrowser.ImportCodeFile", "", "TextViews.View", "ocf", {});

The "TextViews.View" should changed to "Documents.Document", I think;
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

> The "TextViews.View" should changed to "Documents.Document", I think;

Documents.Document is, ironically, undocumented.
How can it be used if it does not exist officially?
In addition, since TextViews.View works well, why should we change that?
Furthermore, why only for ImportSymFile and ImportCodeFile?

- Josef
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

A first draft version is in the branch.
For diffs see https://redmine.blackboxframework.org/p ... ec172f4dc6

It is largely based on Helmut's proposal with some refinements in ImportUtf8:

- if the optional byte order mark (BOM) is found at the beginning of the file, it is skipped.
This is because the BlackBox text editor gets confused when the BOM is imported.

- characters larger than 16 bit are reported as '?'.
This is expected to be very rare but has been added for the sake of completeness.

- in case of finding an illegal encoding it falls back to importing a windows text.
This is an experimental feature that would allow us to use the Utf8 importer also for xml or html texts, I think.
html and xml is often encoded in utf8 but sometimes it is not directly known if it is or not.
Please think about this and give feedback if this is a good idea or not.

- Josef
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Zinn »

Thank you Josef. Your improvement is a perfect solution. On error falling back to text format is a great idea. So I have not to open the same file once again when it is not UTF-8.
- Helmut
luowy
Posts: 234
Joined: Mon Oct 20, 2014 12:52 pm

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by luowy »

ExportUtf8, how about skip views?

Code: Select all

  IF (ch # view) & (ch # para) THEN
    IF ch = CR THEN WriteChar(wr, LF) ELSE WriteChar(wr, ch) END;
  END;
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

Good point. Should be included. Have to look how this is handled in other text importers/exporters.

- Josef
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#201 Importer and Exporter for UTF-8 texts

Post by Josef Templ »

ExportText also treats TextModels.viewcode and TextModels.para specially, i.e. they are skipped.
But why is TextModels.para skipped? Shouldn't it better be treated as a newline character, or even two?

- Josef
Post Reply