issue-#19: Unicode for Component Pascal identifiers
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
Naming
Strings.CharsToSChars(IN x: ARRAY OF CHAR; OUT y: ARRAY OF SHORTCHAR);
Strings.SCharsToChars(IN x: ARRAY OF SHORTCHAR; OUT y: ARRAY OF CHAR; OUT res: INTEGER);
Only in the documentation of these functions is there reference to Utf8.
CharsToSChars always succeeds.
SCharsToChars needs a validity checker to notify the user via res that the short character sequence is valid.
It may not be if generated by some process other than Strings.
By using "Chars" one knows that the standard 16-bit CHAR is being used.
-Doug
Strings.CharsToSChars(IN x: ARRAY OF CHAR; OUT y: ARRAY OF SHORTCHAR);
Strings.SCharsToChars(IN x: ARRAY OF SHORTCHAR; OUT y: ARRAY OF CHAR; OUT res: INTEGER);
Only in the documentation of these functions is there reference to Utf8.
CharsToSChars always succeeds.
SCharsToChars needs a validity checker to notify the user via res that the short character sequence is valid.
It may not be if generated by some process other than Strings.
By using "Chars" one knows that the standard 16-bit CHAR is being used.
-Doug
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
I make some measures for demonstration.
First. During the starting BlackBox uses Kernel.Utf8ToString 7046 times.
Second I made test to evaluate speed of difficult string and simple ASCII strings conversion. The number shows how much time to take to convert two model strings 100000 times (1 string: "test string ïöíúüáä это строка для проверки هیروگلیفsd", 2nd string: "simple ASCII string").
You can see that the difference between Josef Templ and LuoWy version for simple ASCII is 30% for difficult is 50%.
Third as you can see that Josef method fail to detect bad-formatted sequences according Unicode 7 standard.
First. During the starting BlackBox uses Kernel.Utf8ToString 7046 times.
Second I made test to evaluate speed of difficult string and simple ASCII strings conversion. The number shows how much time to take to convert two model strings 100000 times (1 string: "test string ïöíúüáä это строка для проверки هیروگلیفsd", 2nd string: "simple ASCII string").
You can see that the difference between Josef Templ and LuoWy version for simple ASCII is 30% for difficult is 50%.
Third as you can see that Josef method fail to detect bad-formatted sequences according Unicode 7 standard.
This is the test I was using.Josef Templ version:
68.7 ms
12.3 ms
Incorrect input 1: $TRUE
Incorrect input 2: $FALSE
Truncated: $TRUE
Alexander Shiryaev version:
80.1 ms
15.9 ms
Incorrect input 1: $TRUE
Incorrect input 2: $TRUE
Truncated: $TRUE
LuoWy version:
89.2 ms
15.4 ms
Incorrect input 1: $TRUE
Incorrect input 2: $TRUE
Truncated: $TRUE
Code: Select all
StdCoder.Decode ..,, ..k40...3Qw7uP5PRPPNR9Rbf9b8R79FTvMf1GomCrlAy2xhX,Cb2x
hXhC6FU1xhiZiVBhihgmRiioedhgrZcZRiXFfaqmSrtuGfa4700zdGrr8rmCLLCJuyKtYcZRiX
7.2.s,MB1,0k,5TWyql.bnayKmKKqGomC5XzET1.PuP.MHT9N9ntumaU2,CJuyKtQC98P9PP7O
NbXmb.2.QkUk2kx00,6.cUGpmWLuOpoKqvCbHZiYpedhA704TeKKw.bHfEWUmL.6..D.9762U.
sUXDJ99SqorGqmQCbWBxhYFWUl1UnNHEWUmr.6.wZSk5kHu,,E.M.Vd.cU.ktAcoZimBhWhioh
gnZcZRCY.2.wY.E.0.p.,6.M.,.JFyuv.U.2m,.7z.AU0KyB.,U2X.UO.,.1.e0.,6j3.y.wU7
k.y.Ac,k.82QHAUJ2.A1AU,E.0kO8.1M.kI9.O,2,A.E71U.AU2U.QX,k.G.0U26.ezzzzzB.6
xzzzz5cy.6wzzzz5.we,E.2.uFq8Ua5V0cUXDF9fR5uPPPP1fP7PNZvQRtIdHf.2UlbcZpC.c9
h0E.8z,E.0.,V.2.o.6.K,yU.E.ED.cwR.0.,,,.B.0UJ.E.ED..6.222.o.6.K,0,,6.ED..6
.222.o.6.K,0,,6.ED..6..E1U.M3c1szPuH7OJNOF,7J9vQdPJdfNltCPM1H68J76Je9,7J9P
PV9PN761e9,tIFPOZPS1PNh99,NGR767ONRPObvPh99,tJR76NORT96BvPZ96HuQbPR9965NAn
76JN8PM1HMGP8ITeId86b8RZPORvNb99,7HTvNN76LONZfP99PrN1HM1HcJ1eIPM0H6Qp76VeI
TuE98FfeI988HeH,NORfC,NEZeI1OK,tHB86b8GTeIduEFOEZuC,tHf8J,tPf9Rp761eI.CIY4
2UmhgnJbUAdCZe3xc3JedQbBgV72eGxd1,0GeK4Xd8rN1HcJ1GE8rmC5.in4ak2aKreHE.8HM0
mbOIEC3.q.TPRdfC..C2EVKoXaIbqk2ako2YugbUIYW2Yf2Ykgc23fUQZU2a,3aM3YfEgin4ak
20LIaKrmGEyquYZUIiZN8rN1HM0NuPDf9b8RZPO2ZWAdiRgjJimhgXZiUAhi3ipZiUAau2YWAZ
v2YAxhbpZ0xhjZhcIiZRiUgbUIadQbU.NePrN1HMFR8F,7J9vQQbBg,..IaeQbBAVK,I5.k2aa
u..sI..T1..C2EV.kh0ni0GRqHE0nWGYvg,a4XNL,dCvV,BaMRbBA,cAv86pNDWnVWpRqk2Ung
fUIbx6A90.dNL,dCcE98KrN1UpgfUI56BluC.hNL,dC6Kr,V1.kIUAVH,UX,UkVmIbUIYd.I6.
,,.m2g6.in4qk2..sAJtCPM0h0cC.U7ABp,.kd..y4.66TeF,tE.30kdGLta4bf9b0Yejheopg
s2ZWYhjphb3YnZimVUogjFuKaoJYg2YdphgcQ9nIUk,y4.Eakd.FFe8ruuql4KuKKmeHE8mI.8
2UUkMamR0GaEaU3,sArN1PM0.keGLnYejReoJidlUCJJ0GIaIb00.w7C3.00wBe1.Q6.ICe1.H
l4aEfEmmGE8KK2jg2YdZZUIhg2YhBgsJbUAdC,i130Em0GRqHEQbUAhUIbx.J96pNDkq4Kw0GR
qXAhcC3ZjhioBZUgZUAavgV7AVL3d7Zd33YcAhUYbUYd3p7HfPHN8,d7,78Hnhaqi0mF0GMWpI
0GH0WY3YygbU2ad2Y2xdU2jUIbxsHZ8FFNOR1HtCPM0akV4odKIEGKEyIX2augV7AV7AV1,l96
TeFoZiwa43dugV7k2Ad43Ye3Yw2YhBAGJYKIb0mrKLuiJpqJEe158GZ88lP8r76HeH588JP8PM
0HM0HkWmodKIEGaug5PdAPM0Hk2gcCN1HM0HM0t96VtEZ7GRd9V7FB8Gp76396pNDWLEqobGIE
8HMWoR0Gm0GRMAPM0Hk2m598AFe9R7A9eFFeC,,2DkM0HYiHE.ZN1HM0akWm2.PNAPM0Hk2KIb
.tnMen4ak2AV1,w7WHMWILuW0pc6JbBU7MG....C28KEeGEGHMWIEiGEWLEq2GHMWoIiHE.iHE
GKEEMqk2aU7MFN0UhI4M0KIb.HMFN0.X,akWu26T8HRqk2aU7Q6..aHXWIRq.aU7Fl0mS0GM0G
eW2GKEe1.HkWm28KEe1396J,00.sC,,A4M0KIb.H6T0nU0HYuWk.k2A7.78G,7J.M9U7MFN0I5
8ae.,,.Q5Ul.HkWu2M098H6NkK.HMFR0UvgV7kYuoVWmoam4ak2K2krsK.V7KkYOYY3Yx2Yk22
EtKaUIbx22,78J76TvO,d8HN16HaIX0GmkK8HEGJYK2.4HEWGJ0Gu8ru.0GJam4.9GtKqtUmQb
U2Ze2YYhgXxhYhgUAhiRAP9QNPNdPN,tPZnm8LtyKt0GJam4Ebg,9eH0meGLn..rN1HM103...
eIeeGEWmYu20GR0mU.sI.Q5kr66p7610U1,qEE0GE0GE..kbKJeICeX7,A30GEOpU8JEaKK00O
rkmKK0mq44p76H0sC,tMF96p76b8G.gVU2YB2YU2eGx7.BuPZPP19R9eQZvPWmIin40W0hc5B7
,tPkSAhiZYvcQ9vQ,,Z76Fd8BvPZ1,NNZfQIZdgVU2Y3pd23Y4xhmV3,rN1gV7Ic3VB2YU2YX3
hUMDaaP3aRRbUAhUUlQbUIhUUkQ5Ux6HsPA3k40WUwe6BdAVX3hUQYU24Ue3YwMPAZUY6P66,F
E0mYOIECKo0GS0GQ0Hg0GeW2P6666whptK.59Or76HeHIBP660WUgcARe7llUkg6l86d0UUEv4
aUIbxsHsMFP8,N9AbmQbB2222a2hPMNHS0WUY82Y4xhm,UUIe3ZeJJeC3Y3pd2tCP66,VUklcC
ABaav2Y7VdtC,dR19Pe1hPMNHJ0GPGHEiGEyIdG2.Z7CrN1,,,V7FICKo0GS25aWDJeU2ZXFTq
198AaGEGJYK2.....,Vj,U1VqBggBZv22cO.,NF.Y5VdFV7K,7Jk40GE0WUEv66pVD7FM8qWmI
aoQbB2222a2h1tVU.O2....66QAe1H1.a4UuEvEJ..Ul.EEMG230GS25sH00o5.G3.....22C4
pVd,..I5O5e0.kIg3.66A7EEm1kb.u1.d0.....00T1.50M8.J1669W3hVU2200BuP..UB2YUE
EKIb.665Xuko.UdN1,76K2rFEsP.6Al0A7C4v76V7K,,IC.,VjRheAZUgcAR8,,kM00dfQffPU
eAZUgcCZcB.K3.UvgV7g,.ke..NuJJ76F,f9RB9Cp7610cFC3UG76T0b9RZfC,NE.Q6.ICe1.H
V7sETeHb8J,NON9P9vN19P3OSdPN,ND,dArFu8ru.0mS0mMiHIeGE8rmgigZiUI3aU1,Oqo8rt
GLEqXkQ5dPMH9P,ND,NAktGrkGrmemIqk4AVKB6WLK00kqcC.QbBAV7AVX3hu2YH,UBUnZZUQi
mI5HeH.,78QC66JN8M0dfC,NG.UoBgdFlaLuKqt0GJa0whfZZUQipJimJbUIcDxdAhc,pdvAVU
2Ze2YnhimtPDPMdPN,,M1HM0h0Fd8VOFVuAltAJN8PM0aElKLneHE42sA,tHB86b0.ErmGEqKR
0mYu2k4k4Ic3,HEtKqt000nRqk2ako0GRq10Gp.gB00m2CLu8rI0mK0Wv2Yn3Yug5BPOZXv2YB
AV7w8UpZiatKHPL,t601O0.a00m4aU7QA,dCgiopAsCPM0akYOYn3Yx2Ya,,7JF0PM0Hk284r8
Av86sMcP,dCvlMi1Z76pVkktKLt2Yug5BOENuI9uC,76PM0Hk2CoUklWKEyIXgV7k2mbl2fcIZ
k2feAZioZrocMJbUQioJiPJhR32QArlYEpQbBU7YDVtEWJLuGMGYMJbU2jUI5yY2VdM9A50mt0
0Grkaav2Yo3YukMgV7k2m59WioZkg6leC,N1HU76S,dCw7.I42YBU7MGQAqXkg600sQZ76MAU7
MFNmY.EW2YI,.ZtCPM0A,9eHiX7k2QiUI5G5.I4k2m5BWio3B8BleC,N1M0W5w7.6BVtC,N1M0
a2.B8A00.sAr76PM0A,98H.Uo66.UogV7A,HkWu2k2QC6R.kNsQf16JZOJ9uCPM0AV3Z7986Fd
8CqruqtKrqKKEOqoYinZiU6PUUIA22U7sQdfQr0sE6A7uEV7AF86L,UdQ5H066ZPN00aKqm4I6
QbBA,akWu2UAVBA,akr2Yug5d00m4a.HsEkt8XDpcBAV7AV7YDeHEaIX0GI66tFMWHMWZDJecQ
gc3Yy2YkIc43fd2YI,TvOe1B0iX3pd2R5M0tnMeHEaIX.tV,7KHtHUy.....U7YDZFEaIX.tVs
.UykQOIgaGE....M0tnNeHEaIX.tVt2aMB3UyEV.....aEyYau2Y7,Y5W1.u1ldFlO8,,...U7
gcCFEU7A7yqpYe6h6q.aU2hc13ZoBZv2Ys3YuEwIZU..aGEMAZVUg,A,HWo3YxEEG3.HU7ltK5
GJYEIeGEC5y4.M06F6SN76X7AV7AV7GHV7k2kt.kV.l7AV7G,Vs7FHeJ,7BV7AFGIemayIW00V
FJamIiHE.g,A,QC.EdkVUfUB,M80mY.0GIemq4qw8qm0Gpunq4Kw0GEemIU7kWm2PM0Hk2kt..
2D.k4k2MFRWBU7kt00O4bnR.HkWm2,dMffNrePv86pVXV7ViBZv2YnFR6A.HkWu2k2KIa68J76
H9PN100QC.HEXyId0mq0GREEwdUohUgZUAaUYcD,a.HWe7Dv76PPMl96d0CLu.sE6A.w7IgppA
PPLHN8r76EpkWEEWGJ0GWCIgWJE0GJ.HkWu2P.HEt.aKq.30r767OF588HP8rlt00O4.0GryKu
0mlyKr.H9PUUYiVBB,,00k2K2..aIbMOg,K2kYOYcQiUg5dPMAZa2Zi3Yy2YkAZUY868KLr8rm
CrrmKvKKm0Gla5bf8HN1cF.24..k2a2J1.kt.kV....MGcO.00G2.MFEt.a4EVkRq.90PU7luG
566EE.Z16RZPRUYNFRWU23GLt.c8kYcOuHEqqkW5...QbUIBMP1nR0mWu2PUnZCe46AluCakW6
6EeQ8a4QbBA,0Jd.sEFPN5vOb8Q9PN796F7R9vQCJu85eHE4IdUDlVkIi1h03PM5vOp7610Aam
66T0sC,NRdfNeX,,8nOOHEyIX0md.k4cQYZUAhgcPp76HeHK2UolU.X7A,tHBGayIbSoYuIein
4A,VdC,7HT0U0,sG9fQRXiQeoJidFe6R2ZohA.mGEKLuOag2YmlIk2qU7o6ZGr0GREEEaKIbYi
d2Yh.qk2ak2GbUIbxsG9fQEeaqqKKIamRq.O2aKRqHE01Aak240HEGobq.aEsgio,3PM5HK0Gt
K4PM0akWsCA,dvKRPL,,b8GTWc2Z9hgm,.,N9GLMaGEeGE4HM0XUMGQdZJC6RHPP9fI9vQTnuG
royKram4aU3,Yik.VtCPU4Vi,0Ge6H.A470a.Eu0HEiGEGrhuqiqk2a.UAVGhgVZh4xhm78d9A
T7H9eHFVg2Yloag2YkYZUgZlYZU2b439r76N0b023,NPbf6HtC,7H6Hqk2KIb2Y13hZ,sCPM1P
M0VeI.Y8YaeQbBAVKVnZCX79,tQdHNeHE42EN.50M12eG,EWyKeKqtGLIamRq.3OF.HsEFPN51
68b9RZPAHtCPU7..ENamRq.GpmCLu41.81a.sAAV7g6YcjZ8QbBk4I6.N0HePd98LONUpZCGrr
CJuUChihBZv2YAVAtCU7QC4HEenSIYohgn76bXdFEySvrSxnzkHSEqI09I0vH01mUGMVGMUGMT
GMRGMMGMEGH0jH0zI01mTGMUGMTIa2ia24c2Kb24b24Y5pkArklok6pkjqk2pkArk,pknZgaJY
vgV7sQIaUI5QidhhkZhZ3Y,Re1Bd73YnZimVWQbBA,PUA,2YAxBC368eorCrmOKEGpm2CoiZJi
nBhjphuIYdQbU.NWBEs0GRqHEKJu6JC3HX8V76FT9J9XvUB.UU.b02318P99S1fP7PNZ96b8OH
1OLEOrm85.UAl46Q.kd.10Y6d0k4.EEUH,WWAhijxeZ3YqhA..m2PUk,.b0Y7A,7Ge.g,9eHYe
ZRCdtCPM1KIbG2MJc9PM136J9vQ.dONbnMqE,G3K3.ZN136J9XJ,kNqk48Eeke.6BPM136JMJ.
ga0CyIhACoruKu8rrmKqKKtCLLCZYRcoJigZcZRiX3Ulb8..umVyKrG5EWQiX3.501POELUm,.
.Unp3.6F6.ZD,2U.UIU.U76.0E..k.8ssH38pumqm8rtumdcIf9PY62Ulb8.CLL8pumqmY62Um
T.2U.kJ3.D6.0kFF.0U10.bf9bWHZitZhZZcZtM,Mw.ELMSN12Umz.,6.0.E2Eh2U.2U.E,,.R
NEd1K5GomCrl0U2U...G00k.0.0.0mFf3,E.mLT5UTyB,M.,U,U.2.8Mtr.2..c4,.,.1.e06.
2UEC.6..mEw3UAUgQnPt0lLU8ssHorMP9fPsET1.UG2U.E..U6U..HE.6aLuQ0mHCe.az86Utj
0WlbWaUKZM4.Co0...
--- end of encoding ---
- Josef Templ
- Posts: 2047
- Joined: Tue Sep 17, 2013 6:50 am
Re: Issue #19: Unicode for Component Pascal identifiers
Sorry, Doug, but this is also wrong.DGDanforth wrote:Naming
Strings.CharsToSChars(IN x: ARRAY OF CHAR; OUT y: ARRAY OF SHORTCHAR);
Strings.SCharsToChars(IN x: ARRAY OF SHORTCHAR; OUT y: ARRAY OF CHAR; OUT res: INTEGER);
-Doug
A short string and a Utf8 encoded string are NOT the same.
A short string is a string that consists of SHORTCHARs only. No encoding, plain ISO-Latin1, 1 byte per character.
A Utf8 string is a sequence of bytes that encodes a (long) string and in order not to use SYSTEM.BYTE
as the element type it uses SHORTCHAR as an alternative that avoids importing SYSTEM.
The true nature of a Utf-8 string is actually ARRAY OF SYSTEM.BYTE.
For strings consisting of plain ASCII characters only, short strings and Utf8
strings have the same representation.
- Josef
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Josef, but you version does not make format check as you saying. Because format check means to detect if the sequence is well-formed according Unicode 7 standard. You did not make this check. So you talking about "wiki format check".Josef Templ wrote:The voting should be about which kinds of checks are performed.
no checks, format checks, content checks.
- no checks
- wiki format checks
- format checks
That is OK, if we will rename function and will not use it in Strings module.Josef Templ wrote:If you assign a character code to a CHAR variable in ComponentPascal (ch := 0yyyX;), there is no limitation regarding the
possible values of the assigned character code. As long as there is no such limitation, there is no value in
checking the contents of an Utf-8 string. You can always introduce illegal characters into a string
by means of an assignment of character codes, by means of reading in a two byte Unicode from a file, from the clipboard, etc.
Checking the contents of a CHAR or string is an independent issue that is much broader than
doing it only in the Utf8 conversion. If there is any need for doing such checks, it can be discussed in a separate issue.
Now we are blocking issue-#19 with mixing it up with a different issue.
I do not think, that we should make votes for changes which aimed to support repository managements. Now we are two with Josef who responsible for this and we can make any changes without voting which do not touch blackbox. This is done for people can easily work with our repository. So I fixed building pipeline (you liked this, but this is also have no common with #19) and fixed README.Josef Templ wrote:Also there is a change in the README file committed by Ivan. This change is not related in any way with issue-#19.
Ivan, it seems that you have not understood the concept of a topic branch. The changes done for a topic branch should
all be related with that topic. That's why it is called a 'topic branch'. For somebody not experienced in
software engineering techniques this does not make a big difference, however, in the long term it
is a must in order not to get a complete mess in the repository and its history.
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
I want to make everybody attention for the fact that two people not from the Center have sent their solutions to us. So this is make sense for people how we will solve this issue. This is important for them, that we should use correct code.
Re: Issue #19: Unicode for Component Pascal identifiers
Josef's solution is also correct code. It does its task very well. There exists more than one solution.
The solution from Alexander and the solution from Louwy can be put into a library
and used with other projects where you need this kind of error behavior.
Feel free to use it in your own programs.
There are other places inside BlackBox
where uses “inline procedure” for translating Utf8 to String without any error checking.
(e.g. string constants)
The solution from Alexander and the solution from Louwy can be put into a library
and used with other projects where you need this kind of error behavior.
Feel free to use it in your own programs.
There are other places inside BlackBox
where uses “inline procedure” for translating Utf8 to String without any error checking.
(e.g. string constants)
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19: Unicode for Component Pascal identifiers
Helmut and Josef, can we put LuoWy solution to Strings but in Kernel keep Josef's faster version?
I also like how LuoWy return error. It allows to detect both the truncation and wrong characters errors simultaneously.
1 in case of error, 10 in case of truncation and 11 in case of error and truncation. So:
(res MOD 10 = 1) gives wrong character error
(res DIV 10 = 1) gives truncation error
You can see this in original LuoWy version.
I also like how LuoWy return error. It allows to detect both the truncation and wrong characters errors simultaneously.
1 in case of error, 10 in case of truncation and 11 in case of error and truncation. So:
(res MOD 10 = 1) gives wrong character error
(res DIV 10 = 1) gives truncation error
You can see this in original LuoWy version.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
That is all that I was claiming. Any other input by CHAR would not necessarily generate ISO bytes. But that is fine. The SHORTCHARs are just an encoding of the CHAR array. I simply wanted to avoid the use of Utf8 in the name because the 16 bit Unicode does not include all of the possible Utf8 encodings. Hence to call it Utf8 is misleading. It is a subset of Utf8. Call it partial Utf8 or something like that. This is a different issue from that of valid Utf8 sequences.Josef Templ wrote: For strings consisting of plain ASCII characters only, short strings and Utf8
strings have the same representation.
- Josef
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
Is the use of Utf8 simply to avoid changing SHORTCHAR to CHAR within all of the compiler modules?
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19: Unicode for Component Pascal identifiers
I just searched my version of BB1.6 and was shocked to find that the compiler modules of that version do use Utf8!
I was under the impression that Helmut was the first to use Utf8 with BlackBox.
Has my version of BB1.6 been corrupted or does it really use Utf8?
I was under the impression that Helmut was the first to use Utf8 with BlackBox.
Has my version of BB1.6 been corrupted or does it really use Utf8?