Page 2 of 2

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Wed Dec 03, 2014 5:20 am
by Ivan Denisov
DGDanforth wrote:
Ivan Denisov wrote:http://forum.blackboxframework.org/whod ... php?id=154

Did not vote
OberonCore
ReneK
akastargazer
warnersoft
English translation: Those who have not voted yet.
Thank you, Doug, I have fixed my script: http://forum.blackboxframework.org/whod ... php?id=154

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Wed Dec 03, 2014 8:44 am
by OberonCore
There is one more solution. It's more general and can be simpilified/modified as needed. It's used, for example, in omcUtf8Conv (http://forum.oberoncore.ru/viewtopic.php?t=4633).

Code: Select all

	PROCEDURE Utf8ToUcs2* (IN inBuf: ARRAY OF SHORTCHAR; VAR inPos: INTEGER; inLen: INTEGER; OUT outBuf: ARRAY OF CHAR; VAR outPos: INTEGER; outLen: INTEGER);
		VAR	inp, outp: INTEGER; st, ch, c: INTEGER; char: CHAR;
	BEGIN
		ASSERT((0 <= inLen) & (0 <= inPos) & (inPos + inLen <= LEN(inBuf)), 20);
		ASSERT((0 <= outLen) & (0 <= outPos) & (outPos + outLen <= LEN(outBuf)), 21);
		inp := inPos; outp := outPos;
		IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END;
		LOOP IF (st IN {1..10}) & (0 < inLen) (*& (0 < outLen) *)THEN
			c := ORD(inBuf[inp]); INC(inp); DEC(inLen);
			CASE st OF
			| 8:
				IF c <= 07FH THEN
					ch := c; st := 0
				ELSIF c < 0C0H THEN
					st := 11
				ELSIF c < 0C2H THEN
					st := 10
				ELSIF c <= 0DFH THEN
					ch := ORD(BITS(c) * {0..5}); st := 1
				ELSIF c = 0E0H THEN
					ch := ORD(BITS(c) * {0..4}); st := 4
				ELSIF c <= 0ECH THEN
					ch := ORD(BITS(c) * {0..4}); st := 2
				ELSIF c = 0EDH THEN
					ch := ORD(BITS(c) * {0..4}); st := 5
				ELSIF c <= 0EFH THEN
					ch := ORD(BITS(c) * {0..4}); st := 2
				ELSIF c = 0F0H THEN
					ch := ORD(BITS(c) * {0..3}); st := 6
				ELSIF c <= 0F3H THEN
					ch := ORD(BITS(c) * {0..3}); st := 3
				ELSIF c = 0F4H THEN
					ch := ORD(BITS(c) * {0..3}); st := 7
				ELSE
					st := 11
				END
			| 4:
				IF (0A0H <= c) & (c <= 0BFH) THEN
					ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
				ELSE
					st := 10
				END
			| 5:
				IF (080H <= c) & (c <= 09FH) THEN
					ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 1
				ELSE
					st := 10
				END
			| 6:
				IF (090H <= c) & (c <= 0BFH) THEN
					ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
				ELSE
					st := 9
				END
			| 7:
				IF (080H <= c) & (c <= 08FH) THEN
					ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6}); st := 2
				ELSE
					st := 9
				END
			| 1..3:
				IF (080H <= c) & (c <= 0BFH) THEN
					ch := ORD(BITS(ASH(ch, 6)) + BITS(c) * {0..6});
					DEC(st)
				ELSE
					st := 12 - st
				END
			| 9..10:
				INC(st)
			END
		ELSIF st IN {0, 11} (*& (0 < outLen) *)THEN
			IF (st = 0) & (0 <= ch) & (ch <= 0FFFFH) THEN
				char := CHR(ch)
			ELSE
				char := "?"
			END;
			outBuf[outp] := char; INC(outp); DEC(outLen);
			IF (0 < inLen) & (0 < outLen) THEN st := 8 ELSE st := 31 END
		ELSE EXIT END END;
		ASSERT((st IN {1..7, 9..10, 31}) & ((inLen = 0) OR (outLen = 0)));
		IF st IN {1..7, 9..10} THEN outBuf[outp] := "?"; INC(outp); DEC(outLen) END;
		inPos := inp; outPos := outp
	END Utf8ToUcs2;

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Wed Dec 03, 2014 10:41 am
by Ivan Denisov
Now the quorum is reached. So we can stop voting and apply Josef's solution.

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Thu Dec 04, 2014 5:48 am
by DGDanforth
Excuse me for being dense but I don't believe we every decided that having a quorum stops the vote.
It is my understanding that a necessary condition for a valid vote is a quorum of the members have voted.
That doesn't mean the voting is over.

For the current vote if the last member were to vote for luowy's solution then we would have a tie.
If at any time the number of nonvoting members can not change the result of a vote then the voting is stopped whether or not a quorum was reached (short circuit rule).

So it is my interpretation that the voting has not stopped and that we need one more vote.
Ivan Denisov wrote:Now the quorum is reached. So we can stop voting and apply Josef's solution.
-Doug

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Thu Dec 04, 2014 7:14 am
by Ivan Denisov
Doug, I agree with you. We can wait for warnersoft voice. Or some Abstained members can change their opinion.

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Thu Dec 04, 2014 2:35 pm
by warnersoft
I've been trying to follow this discussion but sadly most of this is over my head. Is the concern that a malicious developer could craft an identifier (such as a procedure name) with invalid utf-8 sequences that could be passed as a procedure parameter that could possibly allow a branch in execution to malicious code? Or is this simply allowing for development using for example Cyrillic characters in identifiers? If the former then I would vote for the extra code to catch the invalid sequences, the latter I would choose the most efficient (fastest).

Re: Issue #19 : Unicode for Component Pascal identifiers

Posted: Thu Dec 04, 2014 11:38 pm
by DGDanforth
As Josef noted we now have a short circuit vote and so the poll is stopped with Josef's solution the chosen one.