issue-#182 fixing code page conversion in RTF import

Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: issue-#86 improvements in RTF import

Post by Ivan Denisov »

Oleg N. Cher found one bug with encoding during RTF import. I am attaching example document.
BlackBox wrong detect CP1252 than it should detect CP1251.
Attachments
encodingBug.rtf
(51.19 KiB) Downloaded 272 times
Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: issue-#86 improvements in RTF import

Post by Ivan Denisov »

I found fix which solves the problem.
The tag \ansicpg should be parsed correctly to set default font codePage.
Also Unicode was written incorrectly. It should not be encoded (should be written directly).
Attachments
TextConv.odc.zip
(11.18 KiB) Downloaded 251 times
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: issue-#86 improvements in RTF import

Post by Zinn »

In StdCoded text below is the minimized necessary change (adding the green line) to solve the problem:
- Helmut
StdCoder.Decode ..,, ..an....3Qw7uP5PRPPNR9Rbf9b8R79FTvMf1GomCrlAy2xhX,Cb2x
hXhC6FU1xhiZiVBhihgmRiioedhgrZcZRiXFfaqmSrtuGfa4700zdGrr8rmCLLCJuyKtYcZRiX
7.2.s,sON.,k,5TWyql.bnayKmKKqGomC5XzET1.PuP.MHT9N9ntumaU2,CJuyKtQC98P9PP7O
NbXmb.2.Ar9k2kMj.,6.,U08J99SdfJHPNjvQCJuGKfaqmY6MwdONl1QCh0708T,U..w.Qf9U.
2U18J99SqorGqmQCbWBxhYFWUl1UnNHEWUmr.6.ov7k5Efb.,6.,.5O,cU.ktAcoZimBhWhioh
gnZcZRCY.2.Q42U.EBU.U,U.I3l6w1.0E65.w26.6ixg.0sR8U0Cy2hgqRcjhhhBgiZgZJinpZ
HZCh0E.4TWKKv.Uio8.,cw5.0.,,,.B.0UJUv.,.l3Umr,6.222.o.6.K,K10E.E6Gz.2kvI.4
zdGLmOormKmCLLOormKmGomCrl.2.CM.61MRE.k.2U1Ky9U.2.UDULV.2.cBE,9z4U...p.0.4
.,EJ.6.V2w0wzU2djRioxZBxhYxZIhgsZi1xhipiiwhYRgUM.M.3gwP.,6.o36.I16.M.,.JFc
CJ,U.Yr0.,.emuKFeWQ1Iklb8IepZhZJinpZHFdKLq6F9vQ59.XDJ.QiiIepZhZ7F6.Z50E.6i
,U5UXW.2.5Qw.sQRtIQeoBjghg2hgn7.X5.u0n9PU.Iy5.,.60cKE.cUX5.umUGLu.Y62.7.,.
.Y22U,.,.,tcp00EUmx.2cTl.k.E.0.1.0xs5UEKq0cUXzIGKaaKriqtuGaaKriKWKqtk,U.ko
U.kbU.M.D07uPbPGR9R9vNZPMdPOTfPRtI99RFt7FuPbnHmGESmayKmSGK0WIhgstE.SmIiHEG
orC5.UcxhrN.fU.AU.E.0B.Zj3E.c3E.606..4k4E,8Mtr.0E.s76.,UO.,.16.c8,6TxR.E.0
7,Ut4.CE,9z4E.0.n00.p.0U.460.J,U.2Gk,0.363k4E,i0Ck.UpV,Y,IUuW1o.IU1.9,7cUZ
T16.,U.3.,.e,0E.yT.6.,Ue.E.07M.,U0Y961c.z05c9c.N09cUZT16.,UnU.2.e,0E.yD2.e
WME.E.07M2E.UegCWA8z6U.Erk1kgU.sUZz26.,.600Uz.,U0KyJ.,..G00k.0.0.006.,6Et4
.4.0E..2Wo1cwV.,.E4k6U.M.4.,Y..IU.2.zjH9vRp767eCTdENPM5vO3uPlfE99R1v9PsHN9
Np767eCIclwal2ajgVBIUWYcjRi7,.gA3OMbPNFt77eCTdEN1.wYg2YbYcuEV.DN83N13c6H9Q
fPOLeAn7ARtI99RZuPT9RFt758O1fPDPNP7HHvQdv7Hd6PcAZd9XNARdAVNAj76VdCVtAp7BXN
13663c.FuPb9RT,T,.c.PM9XN9HMAlNBHMAll4ak2akVyKqyKtaIrOqr0mS0GcyoYuIeKId0Ge
yIE8pWCob8JW0moGKR0mYuIeKoXKIdiHECKR0GcyKtGrtumVyKqyav2YihgsZiu2Y1xhgxBH0K
IbGoRqk2akVyIbCJe0GuKKwGLEqHE0nR0GnyKrGLu4Kl0mS0mMiHECqrm4dPM396Iav2YnRhd3
iUgbUQav2YaBhZZhYBhiRioJiUg5dtCPM0OpU8JECKoeHECIY4IdiHEGrk8qdGrwcC,tI98JrN
1AVaBgXZig2YqBggZZUYgZpg4xhi79,7NQcjZgZ3eVxgZZZUYgZRioZZUAhY3jg2Yaphphhg2Y
Xphphhg6Q1fQ19ITvQN76HfC,NGR0M1HM0whiZinZZUogjpBp76BuPR9RHePBvPr765vPsQp76
5uPMGk4ak22hVRiChihZ3bvOHHWKqtGLR0GVyobmoW4Ibin4a.BfC,dFTfPCLLOYitC,tMTPPe
HE4Id8pUaJECHN0mbOIECIYk4qGNqm2CHQ8n2CHQ8n4ak2ak2EnuquqKEenS0Gv4KqiHEaKmWL
EenS0GMiXChcL3ZaxhiN8r76BvPoZdZgUIbxcREnUihgsZiUUaxhitQr76y40GRqHEUBAV7AV7
kWmodKIEqk2aU7UaxB0GRqX4xhi78hPMNP8r76jfQRtI9nUGLu8LIGpmWLuq2m4RONj9Jn9Q9f
N1vM998jfQRNMd9RmGEOoru4ELOqIam4ak2A,KIbGoRU7EnELCqrGqmUUIbx6N9fN.gV7A,HkW
mY7pcUQgjhB,ND,t7BvMFPMZvQ99RD76d8G9eHPM0Hk2MGB86FPMbfH2Ya2YcYgZRC,NDOqr6R
AZUYe6h6k2kV4odKIEOrk2YDpcBgZngZ7gapAb7gap2bBAV7M0K2.g537PdfQ59O376d8G90A,
90.EtGLqCKo8GE.M0K2.UVphnBhX3ibJYUYe6,A,a2ku6JFWUYgZpA.00O5KIbGYUgV7A,HkW.
.2hdRgcJYU.P.HkW..YgWRgcJ2.A,Fd8,NPHvQ596JN8PM9P761fQ9967POBfN9fQ9fPdP13c.
,N13QwdONQ6GLtyKqmqm8rtumdYg1ZimZh2hgnRg.ASW5.ELCoruKu.GYnRg.sEMMgA.Z1...b
f9.EWE.8T0E.E82.,.HE.0U..U,IEX.0.11D.n0,6.C66.,.60sD0.,U002..676.16.,6.0.0
mFF.0E.02w92.4.0E.cUqU.E..UO.,.1.eG..2UEC.6..mEw7169rwKiEw3c0Cy2xBq4sE..M4
E..U6U.k.Y,0E.0ohC5ngc0MyfU.aT0QjWa.anL0ud,...
--- end of encoding ---
Last edited by Zinn on Tue Nov 21, 2017 11:17 pm, edited 1 time in total.
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#86 improvements in RTF import

Post by Josef Templ »

As far as I see, this has nothing to do with issue-#86, has it?
New improvements or bug fixes should be treated in a new issue.
Can anybody explain what the findings are about?
Is this an issue with exporting RTF or importing it?

- Josef
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: issue-#86 improvements in RTF import

Post by Zinn »

Just download the file "encodingBug.rtf" on the last page above and load it to BlackBox.
Part of the text is unreadable. After adding the changes from last page and load this file the text is readable completely.
In StdCoded text below is the minimized necessary change (adding the green line).
- Helmut
StdCoder.Decode ..,, ..an....3Qw7uP5PRPPNR9Rbf9b8R79FTvMf1GomCrlAy2xhX,Cb2x
hXhC6FU1xhiZiVBhihgmRiioedhgrZcZRiXFfaqmSrtuGfa4700zdGrr8rmCLLCJuyKtYcZRiX
7.2.s,sON.,k,5TWyql.bnayKmKKqGomC5XzET1.PuP.MHT9N9ntumaU2,CJuyKtQC98P9PP7O
NbXmb.2.Ar9k2kMj.,6.,U08J99SdfJHPNjvQCJuGKfaqmY6MwdONl1QCh0708T,U..w.Qf9U.
2U18J99SqorGqmQCbWBxhYFWUl1UnNHEWUmr.6.ov7k5Efb.,6.,.5O,cU.ktAcoZimBhWhioh
gnZcZRCY.2.Q42U.EBU.U,U.I3l6w1.0E65.w26.6ixg.0sR8U0Cy2hgqRcjhhhBgiZgZJinpZ
HZCh0E.4TWKKv.Uio8.,cw5.0.,,,.B.0UJUv.,.l3Umr,6.222.o.6.K,K10E.E6Gz.2kvI.4
zdGLmOormKmCLLOormKmGomCrl.2.CM.61MRE.k.2U1Ky9U.2.UDULV.2.cBE,9z4U...p.0.4
.,EJ.6.V2w0wzU2djRioxZBxhYxZIhgsZi1xhipiiwhYRgUM.M.3gwP.,6.o36.I16.M.,.JFc
CJ,U.Yr0.,.emuKFeWQ1Iklb8IepZhZJinpZHFdKLq6F9vQ59.XDJ.QiiIepZhZ7F6.Z50E.6i
,U5UXW.2.5Qw.sQRtIQeoBjghg2hgn7.X5.u0n9PU.Iy5.,.60cKE.cUX5.umUGLu.Y62.7.,.
.Y22U,.,.,tcp00EUmx.2cTl.k.E.0.1.0xs5UEKq0cUXzIGKaaKriqtuGaaKriKWKqtk,U.ko
U.kbU.M.D07uPbPGR9R9vNZPMdPOTfPRtI99RFt7FuPbnHmGESmayKmSGK0WIhgstE.SmIiHEG
orC5.UcxhrN.fU.AU.E.0B.Zj3E.c3E.606..4k4E,8Mtr.0E.s76.,UO.,.16.c8,6TxR.E.0
7,Ut4.CE,9z4E.0.n00.p.0U.460.J,U.2Gk,0.363k4E,i0Ck.UpV,Y,IUuW1o.IU1.9,7cUZ
T16.,U.3.,.e,0E.yT.6.,Ue.E.07M.,U0Y961c.z05c9c.N09cUZT16.,UnU.2.e,0E.yD2.e
WME.E.07M2E.UegCWA8z6U.Erk1kgU.sUZz26.,.600Uz.,U0KyJ.,..G00k.0.0.006.,6Et4
.4.0E..2Wo1cwV.,.E4k6U.M.4.,Y..IU.2.zjH9vRp767eCTdENPM5vO3uPlfE99R1v9PsHN9
Np767eCIclwal2ajgVBIUWYcjRi7,.gA3OMbPNFt77eCTdEN1.wYg2YbYcuEV.DN83N13c6H9Q
fPOLeAn7ARtI99RZuPT9RFt758O1fPDPNP7HHvQdv7Hd6PcAZd9XNARdAVNAj76VdCVtAp7BXN
13663c.FuPb9RT,T,.c.PM9XN9HMAlNBHMAll4ak2akVyKqyKtaIrOqr0mS0GcyoYuIeKId0Ge
yIE8pWCob8JW0moGKR0mYuIeKoXKIdiHECKR0GcyKtGrtumVyKqyav2YihgsZiu2Y1xhgxBH0K
IbGoRqk2akVyIbCJe0GuKKwGLEqHE0nR0GnyKrGLu4Kl0mS0mMiHECqrm4dPM396Iav2YnRhd3
iUgbUQav2YaBhZZhYBhiRioJiUg5dtCPM0OpU8JECKoeHECIY4IdiHEGrk8qdGrwcC,tI98JrN
1AVaBgXZig2YqBggZZUYgZpg4xhi79,7NQcjZgZ3eVxgZZZUYgZRioZZUAhY3jg2Yaphphhg2Y
Xphphhg6Q1fQ19ITvQN76HfC,NGR0M1HM0whiZinZZUogjpBp76BuPR9RHePBvPr765vPsQp76
5uPMGk4ak22hVRiChihZ3bvOHHWKqtGLR0GVyobmoW4Ibin4a.BfC,dFTfPCLLOYitC,tMTPPe
HE4Id8pUaJECHN0mbOIECIYk4qGNqm2CHQ8n2CHQ8n4ak2ak2EnuquqKEenS0Gv4KqiHEaKmWL
EenS0GMiXChcL3ZaxhiN8r76BvPoZdZgUIbxcREnUihgsZiUUaxhitQr76y40GRqHEUBAV7AV7
kWmodKIEqk2aU7UaxB0GRqX4xhi78hPMNP8r76jfQRtI9nUGLu8LIGpmWLuq2m4RONj9Jn9Q9f
N1vM998jfQRNMd9RmGEOoru4ELOqIam4ak2A,KIbGoRU7EnELCqrGqmUUIbx6N9fN.gV7A,HkW
mY7pcUQgjhB,ND,t7BvMFPMZvQ99RD76d8G9eHPM0Hk2MGB86FPMbfH2Ya2YcYgZRC,NDOqr6R
AZUYe6h6k2kV4odKIEOrk2YDpcBgZngZ7gapAb7gap2bBAV7M0K2.g537PdfQ59O376d8G90A,
90.EtGLqCKo8GE.M0K2.UVphnBhX3ibJYUYe6,A,a2ku6JFWUYgZpA.00O5KIbGYUgV7A,HkW.
.2hdRgcJYU.P.HkW..YgWRgcJ2.A,Fd8,NPHvQ596JN8PM9P761fQ9967POBfN9fQ9fPdP13c.
,N13QwdONQ6GLtyKqmqm8rtumdYg1ZimZh2hgnRg.ASW5.ELCoruKu.GYnRg.sEMMgA.Z1...b
f9.EWE.8T0E.E82.,.HE.0U..U,IEX.0.11D.n0,6.C66.,.60sD0.,U002..676.16.,6.0.0
mFF.0E.02w92.4.0E.cUqU.E..UO.,.1.eG..2UEC.6..mEw7169rwKiEw3c0Cy2xBq4sE..M4
E..U6U.k.Y,0E.0ohC5ngc0MyfU.aT0QjWa.anL0ud,...
--- end of encoding ---
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#86 improvements in RTF import

Post by Josef Templ »

Zinn wrote:Just download the file "encodingBug.rtf" on the last page above and load it to BlackBox.
Part of the text is unreadable. After adding the changes from last page and load this file the text is readable completely.
There is exactly the same behavior when opening the file "encodingBug.rtf" with Wordpad or OpenOffice.
If this is a bug in BlackBox it must be the same bug everywhere.

- Josef
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: issue-#86 improvements in RTF import

Post by Zinn »

Without the change the first line is unreadable and looks like
Íà Îáåðîíå ñ êîìôîðòîì,
With the change the first line is readable and looks like
На Обероне с комфортом,
LibreOffice Writer displays this sample readable.
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

Re: issue-#86 improvements in RTF import

Post by Josef Templ »

Thanks, I see. What BlackBox shows is not Cyrillic but undefined.

- Josef
User avatar
Josef Templ
Posts: 2047
Joined: Tue Sep 17, 2013 6:50 am

issue-#182 fixing code page conversion in RTF import

Post by Josef Templ »

According to the bug report in viewtopic.php?f=50&t=324&start=10#p6642 by Ivan on behalf of Oleg N. Cher
I have created issue https://redmine.blackboxframework.org/issues/182.

The bug fix proposed by Ivan looks good.
It essentially handles the parameter of the \ansicpg command, which was ignored so far.
The code page specified by this command is used for conversion of the extended ascii characters.

Just one question: can't "PROCEDURE WriteUnicode" be slightly simplified?
It is only used for writing a Unicode character in a \u command. Why does it need Windows character conversion?
I have tried to simplify it. Please let me know if it is too simple now.

- Josef

Edit by Robert: Link to bug report above is now obsolete as the bug report has been moved to the start of this topic.
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: issue-#182 fixing code page conversion in RTF import

Post by Zinn »

Josef Templ wrote: Just one question: can't "PROCEDURE WriteUnicode" be slightly simplified?
- Josef
Yes, it can be deleted. WriteUnicode is a part copy of Write. You don't need WriteUnicode. Just use Write instead,
- Helmut
Post Reply