Thank you, Doug, I have fixed my script: http://forum.blackboxframework.org/whod ... php?id=154DGDanforth wrote:English translation: Those who have not voted yet.Ivan Denisov wrote:http://forum.blackboxframework.org/whod ... php?id=154
Did not vote
OberonCore
ReneK
akastargazer
warnersoft
Utf8ToString converter for Issue #19
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19 : Unicode for Component Pascal identifiers
-
- Posts: 31
- Joined: Tue Sep 17, 2013 10:30 am
- Location: Russia, Orel
- Contact:
Re: Issue #19 : Unicode for Component Pascal identifiers
There is one more solution. It's more general and can be simpilified/modified as needed. It's used, for example, in omcUtf8Conv (http://forum.oberoncore.ru/viewtopic.php?t=4633).
Code: Select all
PROCEDURE Utf8ToUcs2* (IN inBuf: ARRAY OF SHORTCHAR; VAR inPos: INTEGER; inLen: INTEGER; OUT outBuf: ARRAY OF CHAR; VAR outPos: INTEGER; outLen: INTEGER);
VAR inp, outp: INTEGER; st, ch, c: INTEGER; char: CHAR;
BEGIN
ASSERT((0 <= inLen) & (0 <= inPos) & (inPos + inLen <= LEN(inBuf)), 20);
ASSERT((0 <= outLen) & (0 <= outPos) & (outPos + outLen <= LEN(outBuf)), 21);
inp := inPos; outp := outPos;
IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END;
LOOP IF (st IN {1..10}) & (0 < inLen) (*& (0 < outLen) *)THEN
c := ORD(inBuf[inp]); INC(inp); DEC(inLen);
CASE st OF
| 8:
IF c <= 07FH THEN
ch := c; st := 0
ELSIF c < 0C0H THEN
st := 11
ELSIF c < 0C2H THEN
st := 10
ELSIF c <= 0DFH THEN
ch := ORD(BITS(c) * {0..5}); st := 1
ELSIF c = 0E0H THEN
ch := ORD(BITS(c) * {0..4}); st := 4
ELSIF c <= 0ECH THEN
ch := ORD(BITS(c) * {0..4}); st := 2
ELSIF c = 0EDH THEN
ch := ORD(BITS(c) * {0..4}); st := 5
ELSIF c <= 0EFH THEN
ch := ORD(BITS(c) * {0..4}); st := 2
ELSIF c = 0F0H THEN
ch := ORD(BITS(c) * {0..3}); st := 6
ELSIF c <= 0F3H THEN
ch := ORD(BITS(c) * {0..3}); st := 3
ELSIF c = 0F4H THEN
ch := ORD(BITS(c) * {0..3}); st := 7
ELSE
st := 11
END
| 4:
IF (0A0H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
ELSE
st := 10
END
| 5:
IF (080H <= c) & (c <= 09FH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
ELSE
st := 10
END
| 6:
IF (090H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
ELSE
st := 9
END
| 7:
IF (080H <= c) & (c <= 08FH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
ELSE
st := 9
END
| 1..3:
IF (080H <= c) & (c <= 0BFH) THEN
ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6});
DEC(st)
ELSE
st := 12 - st
END
| 9..10:
INC(st)
END
ELSIF st IN {0, 11} (*& (0 < outLen) *)THEN
IF (st = 0) & (0 <= ch) & (ch <= 0FFFFH) THEN
char := CHR(ch)
ELSE
char := "?"
END;
outBuf[outp] := char; INC(outp); DEC(outLen);
IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END
ELSE EXIT END END;
ASSERT((st IN {1..7, 9..10, 31}) & ((inLen = 0) OR (outLen = 0)));
IF st IN {1..7, 9..10} THEN outBuf[outp] := "?"; INC(outp); DEC(outLen) END;
inPos := inp; outPos := outp
END Utf8ToUcs2;
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19 : Unicode for Component Pascal identifiers
Now the quorum is reached. So we can stop voting and apply Josef's solution.
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19 : Unicode for Component Pascal identifiers
Excuse me for being dense but I don't believe we every decided that having a quorum stops the vote.
It is my understanding that a necessary condition for a valid vote is a quorum of the members have voted.
That doesn't mean the voting is over.
For the current vote if the last member were to vote for luowy's solution then we would have a tie.
If at any time the number of nonvoting members can not change the result of a vote then the voting is stopped whether or not a quorum was reached (short circuit rule).
So it is my interpretation that the voting has not stopped and that we need one more vote.
It is my understanding that a necessary condition for a valid vote is a quorum of the members have voted.
That doesn't mean the voting is over.
For the current vote if the last member were to vote for luowy's solution then we would have a tie.
If at any time the number of nonvoting members can not change the result of a vote then the voting is stopped whether or not a quorum was reached (short circuit rule).
So it is my interpretation that the voting has not stopped and that we need one more vote.
-DougIvan Denisov wrote:Now the quorum is reached. So we can stop voting and apply Josef's solution.
-
- Posts: 1700
- Joined: Tue Sep 17, 2013 12:21 am
- Location: Russia
Re: Issue #19 : Unicode for Component Pascal identifiers
Doug, I agree with you. We can wait for warnersoft voice. Or some Abstained members can change their opinion.
-
- Posts: 3
- Joined: Thu Sep 26, 2013 7:35 pm
Re: Issue #19 : Unicode for Component Pascal identifiers
I've been trying to follow this discussion but sadly most of this is over my head. Is the concern that a malicious developer could craft an identifier (such as a procedure name) with invalid utf-8 sequences that could be passed as a procedure parameter that could possibly allow a branch in execution to malicious code? Or is this simply allowing for development using for example Cyrillic characters in identifiers? If the former then I would vote for the extra code to catch the invalid sequences, the latter I would choose the most efficient (fastest).
- DGDanforth
- Posts: 1061
- Joined: Tue Sep 17, 2013 1:16 am
- Location: Palo Alto, California, USA
- Contact:
Re: Issue #19 : Unicode for Component Pascal identifiers
As Josef noted we now have a short circuit vote and so the poll is stopped with Josef's solution the chosen one.