Page 1 of 1

brainstorming Strings extensions

PostPosted: Tue Mar 20, 2018 1:26 pm
by Josef Templ
The BlackBox Strings module is rather minimal as it is now and there has been some discussion
about possible extensions from time to time.
This topic is intended for collecting ideas and proposals for possible extensions
and to discuss the pros and cons of such extensions.
It is not the goal to do any such extension immediately but rather to trigger some discussion.

- Josef

Re: brainstorming Strings extensions

PostPosted: Thu Mar 22, 2018 9:31 am
by Zinn
We have already done some extension:

Code: Select all
   PROCEDURE IsAlpha (ch: CHAR): BOOLEAN;
   PROCEDURE IsAlphaNumeric (ch: CHAR): BOOLEAN;
   PROCEDURE IsIdent (ch: CHAR): BOOLEAN;
   PROCEDURE IsIdentStart (ch: CHAR): BOOLEAN;
   PROCEDURE IsLower (ch: CHAR): BOOLEAN;
   PROCEDURE IsNumeric (ch: CHAR): BOOLEAN;
   PROCEDURE IsUpper (ch: CHAR): BOOLEAN;

   PROCEDURE SetToString (x: SET; OUT str: ARRAY OF CHAR);
   PROCEDURE StringToSet (IN s: ARRAY OF CHAR; OUT x: SET; OUT res: INTEGER);

   PROCEDURE StringToUtf8 (IN in: ARRAY OF CHAR; OUT out: ARRAY OF SHORTCHAR; OUT res: INTEGER);
   PROCEDURE Utf8ToString (IN in: ARRAY OF SHORTCHAR; OUT out: ARRAY OF CHAR; OUT res: INTEGER);


and should minimize the further extension. Currently I miss the following extension:

Code: Select all
   PROCEDURE IndexOf (IN s: ARRAY OF CHAR; c: CHAR): INTEGER;
   PROCEDURE LastIndexOf (IN s: ARRAY OF CHAR; c: CHAR): INTEGER;

   PROCEDURE Trim (VAR s: ARRAY OF CHAR);

- Helmut

Re: brainstorming Strings extensions

PostPosted: Thu Mar 22, 2018 11:59 am
by Josef Templ
I had a look at the latest ETH Oberon A2 version of the Strings module.
(see https://trac.inf.ethz.ch/trac/lecturers/a2/browser/trunk/source/Strings.Mod).

It also has Trim, IndexOf, and LastIndexOf operations.
In addition, it has TrimLeft, TrimRight and much more.

It has, for example, a very nice and small "Match" function that can do string comparison
with wildcards "*" (a sequence of arbitrary characters) and "?" (a single arbitrary character).

In addition to Strings, it has a module named DynamicStrings which provides
dynamically growing string buffers and hash table based string pools.
DynamicStrings is used e.g. for XML and HTML parsing.

- Josef

Re: brainstorming Strings extensions

PostPosted: Tue May 01, 2018 4:05 am
by DGDanforth
TYPE
SString = POINTER TO ARRAY OF SHORTCHAR;
String = POINTER TO ARRAY OF CHAR;
PROCEDURE New (IN x: ARRAY OF CHAR): String;
PROCEDURE SNew (IN x: ARRAY OF SHORTCHAR): SString;


Re: brainstorming Strings extensions

PostPosted: Wed May 02, 2018 5:04 am
by Josef Templ
DGDanforth wrote:
TYPE
SString = POINTER TO ARRAY OF SHORTCHAR;
String = POINTER TO ARRAY OF CHAR;
PROCEDURE New (IN x: ARRAY OF CHAR): String;
PROCEDURE SNew (IN x: ARRAY OF SHORTCHAR): SString;



I agree. The types Strings.String and Strings.SString would be good candidates.
The constructor function is named 'NewString' in A2 ('New' + type name).
This naming convention would also give a good name for the SString
constructor (NewSString).

If String (SString) is introduced then the question is if there are any meaningful
operations on it. The only one that comes to my mind is a substring function like

PROCEDURE Substring*(IN s: ARRAY OF CHAR, offset, len: INTEGER): String;

similar to Strings.Extract but returning a String object.

- Josef

Re: brainstorming Strings extensions

PostPosted: Wed May 02, 2018 6:42 pm
by DGDanforth
If String (SString) is introduced then the question is if there are any meaningful
operations on it.

I frequently return a string from a function.

Re: brainstorming Strings extensions

PostPosted: Thu May 03, 2018 8:28 am
by Ivan Denisov
There is Strings module by Ivan Goryachev.

https://blackbox.obertone.ru/extension/Strings

Code: Select all
DEFINITION StringsXml;

   CONST
      dataSize = 64;
      strSize = 256;

   TYPE
      String = POINTER TO LIMITED RECORD
         len-: INTEGER
      END;

   VAR
      null-: StrPtr;

   PROCEDURE AppendChar (s: String; ch: CHAR);
   PROCEDURE AppendStr (s: String; IN str: ARRAY OF CHAR);
   PROCEDURE CloneStr (s: String): StrPtr;
   PROCEDURE Create (IN str: ARRAY OF CHAR): String;
   PROCEDURE ExtractStr (s: String; start, len: INTEGER): StrPtr;
   PROCEDURE GetChar (s: String; pos: INTEGER): CHAR;
   PROCEDURE GetStr (s: String; VAR str: ARRAY OF CHAR);
   PROCEDURE SetLength (s: String; len: INTEGER);

END StringsXml.


and it's object oriented wrapper:
Code: Select all
DEFINITION StringsDyn;

   TYPE
      DynString = POINTER TO RECORD
         (ds: DynString) AddChar (ch: CHAR), NEW;
         (ds: DynString) AddString (str: ARRAY OF CHAR), NEW;
         (ds: DynString) Char (idx: INTEGER): CHAR, NEW;
         (ds: DynString) Clear, NEW;
         (ds: DynString) Length (): INTEGER, NEW;
         (ds: DynString) PartAsString (from, to: INTEGER): String, NEW;
         (ds: DynString) SetLength (len: INTEGER), NEW;
         (ds: DynString) String (): String, NEW
      END;

      String = POINTER TO ARRAY OF CHAR;

   PROCEDURE Create (str: ARRAY OF CHAR): DynString;

END StringsDyn.


It is good to keep the original Strings module As Simple As Possible, but develop this or similar Strings extension with all yours experience.

Re: brainstorming Strings extensions

PostPosted: Fri May 04, 2018 2:09 pm
by Josef Templ
In my Xml subsystem I also use a similar module for dynamically growing strings.
I called it XmlDStrings. It has dynamically growing strings, string pools, and a string splitter.

All of these is certainly not appropriate for the base module Strings but nevertheless
I agree that it is very general purpose. The foundation is actually coming from ETH Oberon (Aos, A2)
but it needed to be adapted to BlackBox. I am trying to gather some experience in using it in
the Xml subsystem.

The extensions, if any, in Strings must be more basic.
Working on 'ARRAY OF CHAR', not objects.

My current favorites are:
TYPEs String and SString, with NewString/NewSString, and Substring; very cheap.
PROCEDUREs Trim, TrimLeft, TrimRight on ARRAY OF CHAR; very common in all other String libraries in the world.
PROCEDUREs StartsWith, EndsWith on ARRAY OF CHAR; very common in all other String libraries in the world.
PROCEDURE Match for wildcard comparison; because there is a very compact and efficient solution that is
extremely hard to find out yourself.
Something like LastIndexOf (for searching backwards) would be useful for extracting path and file names, for example,
but under Windows there are two separator characters (/ and \), which makes it complicated.

- Josef

Re: brainstorming Strings extensions

PostPosted: Mon May 14, 2018 8:06 pm
by Josef Templ
I just read in an Oracle newsletter about new string functions in Java.
They seem to be required because the existing functions are not
defined precisely enough for all Unicode characters.

http://app.response.oracle-mail.com/e/er?elq_mid=113114&sh=171282221722141115150114224&cmid=WWMK170418P00047&s=1973398186&lid=292124&elqTrackId=b638bb6c745448dd98e6d68ccb85ef05&elq=847fbd82be214cfa902f1e884b664ce1&elqaid=113114&elqat=1.


New Methods on Java String with JDK 11
It appears likely that Java's String class will be gaining some new methods with JDK 11, expected to be released in September 2018.

BUG # BUG TITLE NEW String METHOD DESCRIPTION
JDK-8200425 String::lines lines() "String instance method that uses a specialized Spliterator to lazily provide lines from the source string."
JDK-8200378 String::strip, String::stripLeading, String::stripTrailing strip() "Unicode-aware" evolution of trim()
stripLeading() "removal of Unicode white space from the beginning"
stripTrailing() "removal of Unicode white space from the end"
JDK-8200437 String::isBlank isBlank() "instance method that returns true if the string is empty or contains only white space"