Page 2 of 2
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Wed Dec 03, 2014 5:20 am
by Ivan Denisov
DGDanforth wrote:
English translation: Those who have not voted yet.
Thank you, Doug, I have fixed my script:
http://forum.blackboxframework.org/whod ... php?id=154
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Wed Dec 03, 2014 8:44 am
by OberonCore
There is one more solution. It's more general and can be simpilified/modified as needed. It's used, for example, in omcUtf8Conv (
http://forum.oberoncore.ru/viewtopic.php?t=4633).
Code: Select all
PROCEDURE Utf8ToUcs2* (IN inBuf: ARRAY OF SHORTCHAR; VAR inPos: INTEGER; inLen: INTEGER; OUT outBuf: ARRAY OF CHAR; VAR outPos: INTEGER; outLen: INTEGER);
VAR inp, outp: INTEGER; st, ch, c: INTEGER; char: CHAR;
BEGIN
ASSERT((0 <= inLen) & (0 <= inPos) & (inPos + inLen <= LEN(inBuf)), 20);
ASSERT((0 <= outLen) & (0 <= outPos) & (outPos + outLen <= LEN(outBuf)), 21);
inp := inPos; outp := outPos;
IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END;
LOOP IF (st IN {1..10}) & (0 < inLen) (*& (0 < outLen) *)THEN
c := ORD(inBuf[inp]); INC(inp); DEC(inLen);
CASE st OF
| 8:
IF c <= 07FH THEN
ch := c; st := 0
ELSIF c < 0C0H THEN
st := 11
ELSIF c < 0C2H THEN
st := 10
ELSIF c <= 0DFH THEN
ch := ORD(BITS(c) * {0..5}); st := 1
ELSIF c = 0E0H THEN
ch := ORD(BITS(c) * {0..4}); st := 4
ELSIF c <= 0ECH THEN
ch := ORD(BITS(c) * {0..4}); st := 2
ELSIF c = 0EDH THEN
ch := ORD(BITS(c) * {0..4}); st := 5
ELSIF c <= 0EFH THEN
ch := ORD(BITS(c) * {0..4}); st := 2
ELSIF c = 0F0H THEN
ch := ORD(BITS(c) * {0..3}); st := 6
ELSIF c <= 0F3H THEN
ch := ORD(BITS(c) * {0..3}); st := 3
ELSIF c = 0F4H THEN
ch := ORD(BITS(c) * {0..3}); st := 7
ELSE
st := 11
END
| 4:
IF (0A0H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
ELSE
st := 10
END
| 5:
IF (080H <= c) & (c <= 09FH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
ELSE
st := 10
END
| 6:
IF (090H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
ELSE
st := 9
END
| 7:
IF (080H <= c) & (c <= 08FH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
ELSE
st := 9
END
| 1..3:
IF (080H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6});
DEC(st)
ELSE
st := 12 - st
END
| 9..10:
INC(st)
END
ELSIF st IN {0, 11} (*& (0 < outLen) *)THEN
IF (st = 0) & (0 <= ch) & (ch <= 0FFFFH) THEN
char := CHR(ch)
ELSE
char := "?"
END;
outBuf[outp] := char; INC(outp); DEC(outLen);
IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END
ELSE EXIT END END;
ASSERT((st IN {1..7, 9..10, 31}) & ((inLen = 0) OR (outLen = 0)));
IF st IN {1..7, 9..10} THEN outBuf[outp] := "?"; INC(outp); DEC(outLen) END;
inPos := inp; outPos := outp
END Utf8ToUcs2;
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Wed Dec 03, 2014 10:41 am
by Ivan Denisov
Now the quorum is reached. So we can stop voting and apply Josef's solution.
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Thu Dec 04, 2014 5:48 am
by DGDanforth
Excuse me for being dense but I don't believe we every decided that having a quorum
stops the vote.
It is my understanding that a necessary condition for a
valid vote is a quorum of the members have voted.
That doesn't mean the voting is over.
For the current vote if the last member were to vote for luowy's solution then we would have a tie.
If at any time the number of nonvoting members can not change the result of a vote then the voting is stopped whether or not a quorum was reached (short circuit rule).
So it is my interpretation that the voting has not stopped and that we need one more vote.
Ivan Denisov wrote:Now the quorum is reached. So we can stop voting and apply Josef's solution.
-Doug
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Thu Dec 04, 2014 7:14 am
by Ivan Denisov
Doug, I agree with you. We can wait for warnersoft voice. Or some Abstained members can change their opinion.
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Thu Dec 04, 2014 2:35 pm
by warnersoft
I've been trying to follow this discussion but sadly most of this is over my head. Is the concern that a malicious developer could craft an identifier (such as a procedure name) with invalid utf-8 sequences that could be passed as a procedure parameter that could possibly allow a branch in execution to malicious code? Or is this simply allowing for development using for example Cyrillic characters in identifiers? If the former then I would vote for the extra code to catch the invalid sequences, the latter I would choose the most efficient (fastest).
Re: Issue #19 : Unicode for Component Pascal identifiers
Posted: Thu Dec 04, 2014 11:38 pm
by DGDanforth
As Josef noted we now have a short circuit vote and so the poll is stopped with Josef's solution the chosen one.