Learning functional programming #3: Combining parsers

Matthijs groenPosted by Matthijs Groen on 5-3-2021

This post is part 3 of the series “Learning Functional Programming using JavaScript”. Already read part 3? Part 4 of this Blog Post series is out now!

Earlier parts: part 1, part 2.

At this point, we are able to parse a single character from a string or array of characters. We have applied a lot of eslint rules to force us to take a more and more functional approach. In this post, we will start combining the parsers to create more semantical constructions.

Since I have added a ‘single statement’ rule, my implementation will start to deviate from the one in the parser combinators blogpost from Scott Wlashin. For more background on parser combinators, please checkout his blogpost.

We established a signature for our parser:

1
2
3
4
type Stream = string[];
type Success<ResultType> = [ResultType, Stream];
type Failure = [Symbol, string, string];
type Parser<ResultType> = (stream: Stream) => Success<ResultType> | Failure;

andThen parser

To actually parse more than one character, we want to create structure how the parser should continue when successful or when it fails, by creating 2 more parsers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const FAILED = Symbol("Failed");
const PARSED = 0;
const REMAINING = 1;

const parseError = (target) => (error) => [
  FAILED,
  `Error parsing: '${target}'`,
  error,
];

const characterParser = (character) => ([head, ...tail]) =>
  head
    ? head === character
      ? [head, tail]
      : parseError(character)(`Unexpected '${head}'`)
    : parseError(character)("Unexpected EOF");

const aParser = characterParser("a"); // see 'first-parser' post
const bParser = characterParser("b");

const andThen = (parserA) => (parserB) => (stream) => "???";

const abParser = andThen(aParser)(bParser);
console.log(abParser("abcd")); // "???"

We are now just testing out the signature. We want to use 2 parsers, and that would produce a new parser that accepts the stream.

Just starting on running parserA:

1
2
3
4
5
6
const andThen = (parserA) => (parserB) => (stream) => parserA(stream);

const abParser = andThen(aParser)(bParser);
console.log(abParser("abcd")); // ["a", ["b", "c", "d"]]
console.log(abParser("iabcd")); // [ Symbol(Failed), "Error parsing 'a':", "Unexpected 'i'" ]
console.log(abParser("aibcd")); // ["a", ["i", "b", "c", "d"]]

Now we need to chain parserB to it if successful:

1
2
3
const PARSED = 0;
const onSuccess = (result) => (next) =>
  result[PARSED] !== FAILED ? next(result) : result;

Using this function, I can determine if we had a success. If it is successful, the parser will execute the next step, providing the result.

And now in action:

1
2
3
4
5
6
7
8
const REMAINING = 1;
const andThen = (parserA) => (parserB) => (stream) =>
  onSuccess(parserA(stream))((result) => parserB(result[REMAINING]));

const abParser = andThen(aParser)(bParser);
console.log(abParser("abcd")); // [ 'b', [ 'c', 'd' ] ]
console.log(abParser("iabcd")); // [ Symbol(Failed), "Error parsing 'a':", "Unexpected 'i'" ]
console.log(abParser("aibcd")); // [ Symbol(Failed), "Error parsing 'b':", "Unexpected 'i'" ]

Ok, the last 2 lines are as expected now. But the first one is incorrect. The result of the parse should not be 'b', but ['a', 'b'].

We need to combine the result from parser A with parser B:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const combineResult = (resultA) => (resultB) =>
  onSuccess(resultB)(() => [
    [resultA[PARSED], resultB[PARSED]],
    resultB[REMAINING],
  ]);

const andThen = (parserA) => (parserB) => (stream) =>
  onSuccess(parserA(stream))((result) =>
    combineResult(result)(parserB(result[REMAINING]))
  );

console.log(abParser("abcd")); // [ [ 'a', 'b' ], [ 'c', 'd' ] ]
console.log(abParser("iabcd")); // [ Symbol(Failed), "Error parsing 'a':", "Unexpected 'i'" ]
console.log(abParser("aibcd")); // [ Symbol(Failed), "Error parsing 'b':", "Unexpected 'i'" ]

We are now already reusing building blocks! (onSuccess) And we have a correct result!

orElse parser

1
2
3
4
5
6
7
const aParser = characterParser("a");
const bParser = characterParser("b");

const orElse = (parserA) => (parserB) => (stream) => "???";

const aOrBParser = orElse(aParser)(bParser);
console.log(aOrBParser("abcd")); // "???"

It will follow the same pattern. But the implementation of this one will already be a lot faster:

1
2
3
4
5
6
7
8
9
10
const onFailure = (result) => (next) =>
  result[PARSED] === FAILED ? next(result) : result;

const orElse = (parserA) => (parserB) => (stream) =>
  onFailure(parserA(stream))(() => parserB(stream));

const aOrBParser = orElse(aParser)(bParser);
console.log(aOrBParser("abcd")); // [ 'a', [ 'b', 'c', 'd' ] ]
console.log(aOrBParser("bcde")); // [ 'b', [ 'c', 'd', 'e' ] ]
console.log(aOrBParser("iabcd")); // [ Symbol(Failed), "Error parsing 'b':", "Unexpected 'i'" ]

Wow this one worked immediately! This is because we never modify the stream. We create a new stream every time. So parserB is able to retry it on exactly the same data.

All well so far…

The implementation so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
const FAILED = Symbol("Failed");
const PARSED = 0;
const REMAINING = 1;

const parseError = (target) => (error) => [
  FAILED,
  `Error parsing: '${target}'`,
  error,
];

const characterParser = (character) => ([head, ...tail]) =>
  head
    ? head === character
      ? [head, tail]
      : parseError(character)(`Unexpected '${head}'`)
    : parseError(character)("Unexpected EOF");

const onSuccess = (result) => (next) =>
  result[PARSED] !== FAILED ? next(result) : result;

const onFailure = (result) => (next) =>
  result[PARSED] === FAILED ? next(result) : result;

const combineResult = (resultA) => (resultB) =>
  onSuccess(resultB)(() => [
    [resultA[PARSED], resultB[PARSED]],
    resultB[REMAINING],
  ]);

const andThen = (parserA) => (parserB) => (stream) =>
  onSuccess(parserA(stream))((result) =>
    combineResult(result)(parserB(result[REMAINING]))
  );

const orElse = (parserA) => (parserB) => (stream) =>
  onFailure(parserA(stream))(() => parserB(stream));

Working towards parsing our JSON structure

The challenge was parsing the JSON structure at the top of our file.

An EBNF notation of JSON looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
value = object | array | number | string | "true" | "false" | "null";
string = '"',
  { ? Any unicode character except " or \ or control character ?
  | "\", (
    '"' | "\" | "/" | "b" | "f" | "n" | "r" | "t" |
    "u", 4 * ? hexadeximal digit ?
  ) }, '"';

number = [ "-" ],
  ( "0" | ? digit 1-9 ?, { ? digit ? } ),
  [ ".", ? digit ?, { ? digit ? } ],
  [ ( "e" | "E" ), [ "+" | "-" ], ? digit ?, { ? digit ? } ];

array = "[", [ value, { ",", value } ], "]";
object = "{", [ string, ":", value, { ",", string, ":", value } ], "}";

The most simple type in this format would be the null. So we should start with parsing that one.

1
2
3
4
5
6
7
8
9
10
const nParser = characterParser("n");
const uParser = characterParser("u");
const lParser = characterParser("l");

const nullParser = andThen(nParser)(
  andThen(uParser)(andThen(lParser)(lParser))
);

console.log(nullParser("null")); // [ [ 'n', [ 'u', [Array] ] ], [] ]
console.log(nullParser("text"));

We can now verify that the string starts with "null". But we also want the result of parsing not be a nested construction, we want the literal null as result. In this case, we do not need to really process the result. We only need to verify that the string was "null" so that we can return the literal null.

Let’s create a parser that would return a specific value as result:

1
2
3
4
const resultParser = (result) => (stream) => [result, stream];

const returnNullParser = resultParser(null);
console.log(returnNullParser("test")); // [ null, 'test' ]

Now we need the characterParsers to verify that the string “null” is in the stream, and then return the result:

1
2
3
4
5
6
7
8
9
10
11
12
const resultParser = (result) => (stream) => [result, stream];

const nParser = characterParser("n");
const uParser = characterParser("u");
const lParser = characterParser("l");

const nullParser = andThen(
  andThen(nParser)(andThen(uParser)(andThen(lParser)(lParser)))
)(resultParser(null));

console.log(nullParser("null")); // [ [ [ 'n', [Array] ], null ], [] ]
console.log(nullParser("text")); // [ Symbol(Failed), "Error parsing 'n':", "Unexpected 't'" ]

We can now verify that the string contains "null" but the result now contains our converted result, but also the parsed text, that is not needed anymore. We will create a andThenRight function. it will require to parse both results (like andThen) but will only return the result of the right completed parser.

1
2
3
4
5
6
7
8
9
10
11
12
const andThenRight = (parserA) => (parserB) => (stream) =>
  onSuccess(andThen(parserA)(parserB)(stream))((result) => [
    result[PARSED][1],
    result[REMAINING],
  ]);

const nullParser = andThenRight(
  andThen(nParser)(andThen(uParser)(andThen(lParser)(lParser)))
)(resultParser(null));

console.log(nullParser("null")); // [ null, [] ]
console.log(nullParser("text")); // [ Symbol(Failed), "Error parsing 'n':", "Unexpected 't'" ]

Yes! We did a lot of actions here and it would be great to allow reuse by refactoring. First: Changing the result of a parser (what andThenRight does)

1
2
3
4
5
6
7
8
const mapResult = (transform) => (parser) => (stream) =>
  onSuccess(parser(stream))((result) => [
    transform(result[PARSED]),
    result[REMAINING],
  ]);

const andThenRight = (parserA) => (parserB) =>
  mapResult((result) => result[1])(andThen(parserA)(parserB));

And we could replace the nullParser with the 3 character parsers with this:

1
2
3
4
5
6
const stringParser = (string) =>
  [...string]
    .map((char) => characterParser(char))
    .reduce((a, b) => andThen(a)(b));

const nullParser = andThenRight(stringParser("null"))(resultParser(null));

Much more readable! But the stringParser is now still doing 2 tasks:

  1. convert string characters to a list of character parsers
  2. chain the characters parsers using andThen

We could split this up as well.

1
2
3
4
5
const chain = (parsers) => parsers.reduce((a, b) => andThen(a)(b));
const stringParser = (string) =>
  chain([...string].map((char) => characterParser(char)));

const nullParser = andThenRight(stringParser("null"))(resultParser(null));

If we make this one step more generic, we can create the boolean parser as well!

1
2
3
4
5
6
7
8
9
10
11
12
13
const parseStringResult = (string) => (result) =>
  andThenRight(stringParser(string))(returnResult(result));

const nullParser = parseStringResult("null")(null);

const boolParser = orElse(parseStringResult("true")(true))(
  parseStringResult("false")(false)
);

console.log(nullParser("null rest")); // [ null, [ ' ', 'r', 'e', 's', 't' ] ]
console.log(boolParser("true rest")); // [ true, [ ' ', 'r', 'e', 's', 't' ] ]
console.log(boolParser("false rest")); // [ false, [ ' ', 'r', 'e', 's', 't' ] ]
console.log(boolParser("fase rest")); // [ Symbol(Failed), "Error parsing 'l':", "Unexpected 's'" ]

It works as expected. The problem I have now is the error message is too vague to know it was from parsing a null or a boolean.

Time to improve the message in an error occurs. Fortunately, this is really easy. We can use the onFailure we created before to update the error with a custom message.

1
2
const addLabel = (label) => (parser) => (stream) =>
  onFailure(parser(stream))(([, , error]) => parseError(label)(error));

And now we can use this function to wrap it around the stringParser

1
2
3
4
const stringParser = (string) =>
  addLabel(string)(chain([...string].map((char) => characterParser(char))));

console.log(boolParser("fase rest")); // [ Symbol(Failed), "Error parsing 'false':", "Unexpected 's'" ]

So we have nice messages as well. The composability of the functions really seems to pay off. The JSON parser seems to come to shape nicely. There are some things still bothering me about this implementation though and that is that using JavaScripts own .map and .reduce seem out of place. But we will first try to get the JSON parser complete, and then look into those functions.

The implementation so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
const FAILED = Symbol("Failed");
const PARSED = 0;
const REMAINING = 1;

const parseError = (target) => (error) => [
  FAILED,
  `Error parsing: '${target}'`,
  error,
];

const characterParser = (character) => ([head, ...tail]) =>
  head
    ? head === character
      ? [head, tail]
      : parseError(character)(`Unexpected '${head}'`)
    : parseError(character)("Unexpected EOF");

const onSuccess = (result) => (next) =>
  result[PARSED] !== FAILED ? next(result) : result;

const onFailure = (result) => (next) =>
  result[PARSED] === FAILED ? next(result) : result;

const combineResult = (resultA) => (resultB) =>
  onSuccess(resultB)(() => [
    [resultA[PARSED], resultB[PARSED]],
    resultB[REMAINING],
  ]);

const andThen = (parserA) => (parserB) => (stream) =>
  onSuccess(parserA(stream))((result) =>
    combineResult(result)(parserB(result[REMAINING]))
  );

const orElse = (parserA) => (parserB) => (stream) =>
  onFailure(parserA(stream))(() => parserB(stream));

const resultParser = (result) => (stream) => [result, stream];

const mapResult = (transform) => (parser) => (stream) =>
  onSuccess(parser(stream))((result) => [
    transform(result[PARSED]),
    result[REMAINING],
  ]);

const andThenRight = (parserA) => (parserB) =>
  mapResult((result) => result[1])(andThen(parserA)(parserB));

const addLabel = (label) => (parser) => (stream) =>
  onFailure(parser(stream))(([, , error]) => parseError(label)(error));

const chain = (parsers) => parsers.reduce((a, b) => andThen(a)(b));

const stringParser = (string) =>
  addLabel(string)(chain([...string].map((char) => characterParser(char))));

const parseStringResult = (string) => (result) =>
  andThenRight(stringParser(string))(resultParser(result));

const nullParser = parseStringResult("null")(null);

const boolParser = orElse(parseStringResult("true")(true))(
  parseStringResult("false")(false)
);

Conclusion

The code so far:

What I learned so far:

What did I like:

What did I not like:

Next time:

Matthijs groen

Matthijs Groen

Full Stack Web Developer • Hobby Indie Game Developer Github: matthijsgroen • Twitter: @matthijsgroen

Bij Kabisa staat privacy hoog in het vaandel. Wij vinden het belangrijk dat er zorgvuldig wordt omgegaan met de data die onze bezoekers achterlaten. Zo zult u op onze website geen tracking-cookies vinden van third-parties zoals Facebook, Hotjar of Hubspot. Er worden alleen cookies geplaatst van Google en Vimeo. Deze worden gebruikt voor analyses, om zo de gebruikerservaring van onze websitebezoekers te kunnen verbeteren. Tevens zorgen deze cookies ervoor dat er relevante advertenties worden getoond. Lees meer over het gebruik van cookies in ons privacy statement.