Line breaks in strings

User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Line breaks in strings

Post by DGDanforth »

When I first encountered Oberon and then Component Pascal I was appalled that line breaks were not allowed in strings. Before that time I had used for 14+ years the languages SAIL and MAINSAIL in which one could include line breaks in strings. That was very nice for a whole paragraph of text could be encoded in a single string.

Recently I encountered the need to initialize a matrix given a textual representation of its data (C++ code). To do that in Component Pascal it is necessary to break the lines of the matrix into separate strings. That makes the code cumbersome and ugly.

I propose to allow line breaks in Component Pascal strings.

Doing so would be backwards compatible with all code.
Here is what the CP documentation says about strings.
The following lexical rules must be observed: Blanks and line breaks must not occur within symbols (except in comments, and blanks in strings).

4. Strings are sequences of characters enclosed in single (') or double (") quote marks. The opening quote must be the same as the closing quote and must not occur within the string. The number of characters in a string is called its length. A string of length 1 can be used wherever a character constant is allowed and vice versa. Characters in string constants are allowed to be Unicode (16 bit) characters.

string = ' " ' {char} ' " ' | " ' " {char} " ' ".

Examples: "Component Pascal" "Don't worry!" "x" "αβ"

6.6 String Types

Values of a string type are sequences of characters terminated by a null character (0X). The length of a string is the number of characters it contains excluding the null character.
Strings are either constants or stored in an array of character type. There are no predeclared identifiers for string types because there is no need to use them in a declaration.
Constant strings which consist solely of characters in the range 0X..0FFX and strings stored in an array of SHORTCHAR are of type Shortstring, all others are of type String.
Comments please.
-Doug
Zinn
Posts: 476
Joined: Tue Mar 25, 2014 5:56 pm
Location: Frankfurt am Main
Contact:

Re: Line breaks in strings

Post by Zinn »

Doug,
you can add line break character into strings,

Code: Select all

MODULE TestLineBreak;

IMPORT StdLog;

CONST 
	cr = 0DX; lf = 0AX;
	textline = "1st line" + cr + lf;
	
	PROCEDURE Do*;
	BEGIN
		StdLog.String(textline);
	END Do;
	
END TestLineBreak.

TestLineBreak.Do
but you get other problems, because line break is platform dependent.
You have to add LF only in Linux.
I have forgot if Windows need CR or CR + LF.
And the output does not do what you expect.
- Helmut
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Line breaks in strings

Post by DGDanforth »

Helmut,
Thank you for that.
Yeh, I realized that after positing the issue is not really line breaks in a string but
rather how the compiler assigns a quoted string in a document (module) to a
character array (which is what I want).

For example
MODULE TestString;

PROCEDURE Hello*;
VAR x: ARRAY 256 OF CHAR;
BEGIN
x :="Hello, World!
This is a separate line.
Do you like it?"
END Hello;

END TestString.
which gets marked with compiler errors at "X" as
MODULE TestString;

PROCEDURE Hello*;
VAR x: ARRAY 256 OF CHAR;
BEGIN
x :=Χ"Hello, World!
Χ This is aΧ separate lineΧ.
Do youΧ like it?"
END Hello;

END TestString.
So it is the compiler and not the string that is claiming there is an illegal character in the string
and yet the Component Pascal documentation says all Latin-1 characters are legal.
-Doug
User avatar
Robert
Posts: 1024
Joined: Sat Sep 28, 2013 11:04 am
Location: Edinburgh, Scotland

Re: Line breaks in strings

Post by Robert »

DGDanforth wrote:Here is what the CP documentation says about strings.
The following lexical rules must be observed: Blanks and line breaks must not occur within symbols (except in comments, and blanks in strings).
This seems to say that the language does not allow line breaks in strings, which rule the compiler is following.
So the problem (if it is a problem) is in the language definition, not the compiler.

We would want a pretty strong argument to consider changing the language.

Can you give a simple example of the matrix assignment format you want to use?
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Line breaks in strings

Post by DGDanforth »

Robert wrote: Can you give a simple example of the matrix assignment format you want to use?
Here is what you can do in C++
// training data
int train_X[6][6] = {
{1, 1, 1, 0, 0, 0},
{1, 0, 1, 0, 0, 0},
{1, 1, 1, 0, 0, 0},
{0, 0, 1, 1, 1, 0},
{0, 0, 1, 0, 1, 0},
{0, 0, 1, 1, 1, 0}
};
I would like to do something similar.
User avatar
Robert
Posts: 1024
Joined: Sat Sep 28, 2013 11:04 am
Location: Edinburgh, Scotland

Re: Line breaks in strings

Post by Robert »

Here is what I can do using the Lib modules:

Code: Select all

MODULE  RdcMatAssign;
IMPORT  Fmtrs := LibFmtrs, IVec := LibIVectors, IMat := LibIMatrices;
VAR
  f  :  Fmtrs.Fmtr;
PROCEDURE  Do*;
  VAR
    mat  :  IMat.Matrix;
  BEGIN
    f.SetToEnd;
    NEW (mat, 5);
    mat [0]  :=  IVec.SetStr ('1, 1, 1, 0, 0, 0');
    mat [1]  :=  IVec.SetStr ('1, 0, 1, 0, 0 +0');
    mat [2]  :=  IVec.SetStr ('1  1  1  0  0 -0');
    mat [3]  :=  IVec.SetStr ('0  0  1  1  1');
    mat [4]  :=  IVec.SetStr ('0  0 -1  0 +1, 0');
    IMat.Wrt (f, mat, 8, 0)
  END  Do;

BEGIN
  f  :=  Fmtrs.Log ({})
END  RdcMatAssign.
The output is

Code: Select all

(       1,       1,       1,       0,       0,       0)
(       1,       0,       1,       0,       0,       0)
(       1,       1,       1,       0,       0,       0)
(       0,       0,       1,       1,       1)
(       0,       0,      −1,       0,       1,       0)
This shows: IVec.SetStr uses spaces, commas, or both as separators & IMat supports jagged arrays.
To do that in Component Pascal it is necessary to break the lines of the matrix into separate strings. That makes the code cumbersome and ugly.
I'm not sure this is so cumbersome & ugly that it justifies a language change.
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Line breaks in strings

Post by DGDanforth »

Robert wrote:I'm not sure this is so cumbersome & ugly that it justifies a language change.
You are probably right.

So let me shift the discussion to "why did Wirth think it necessary to exclude line breaks?"

Helmut's comment about CR vs CR+ LF may be the reason but each Host would have its
convention that could easily be incorporated into the compiler
User avatar
Robert
Posts: 1024
Joined: Sat Sep 28, 2013 11:04 am
Location: Edinburgh, Scotland

Re: Line breaks in strings

Post by Robert »

DGDanforth wrote:So let me shift the discussion to "why did Wirth think it necessary to exclude line breaks?"
Simplicity?

They are only excluded from source code, not excluded in general.

Would they add much value in source code? Your c++ matrix example would only be an argument if we could already do the simpler thing:

Code: Select all

vec := "1, 2, 3, 4";
where vec is an ARRAY OF INTEGERs.
We can't, so that is a conversation to have first.
User avatar
DGDanforth
Posts: 1061
Joined: Tue Sep 17, 2013 1:16 am
Location: Palo Alto, California, USA
Contact:

Re: Line breaks in strings

Post by DGDanforth »

Robert wrote: Would they add much value in source code? Your c++ matrix example would only be an argument if we could already do the simpler thing:

Code: Select all

vec := "1, 2, 3, 4";
where vec is an ARRAY OF INTEGERs.
We can't, so that is a conversation to have first.
Your IVec function does that and I would not like to see the implicit conversions that C/C++ does so I will
just have to withdraw my dissatisfaction about source code strings excluding line breaks.

-Doug
Ivan Denisov
Posts: 1700
Joined: Tue Sep 17, 2013 12:21 am
Location: Russia

Re: Line breaks in strings

Post by Ivan Denisov »

Doug, you can do something like this

Code: Select all

mat := StrToMat(
"1, 0, 0, 0, 0" + 0DX +
"1, 1, 0, 0, 0" + 0DX +
"1, 0, 1, 0, 0" + 0DX +
"1, 0, 0, 1, 0" + 0DX +
"1, 0, 0, 0, 1"
)
Post Reply