(Talk at Parsing@SLE 2014, 20 mins)

Dr Tom Ridge

2014-09-14

briefly discuss mechanized verification, and what verified parsing might mean

what has been done so far

the real world?

future plans

Informal (eg Wikipedia): For all x, if x is even, then it can be expressed as the sum of two primes.

Formal (eg FOL): ∀ x. even(x) → ∃ y z. prime(y) ∧ prime(z) ∧ (x = y + z)

- Mechanized (eg HOL): as formal, but the computer checks everything (including proofs)
- what is proof like? a bit like normal proof, but much more stylized and in incredible detail

- Mechanization can give perfect proofs that talk about clearly defined platonic objects (such as numbers)
- if we believe in the proof system, then the proved statement is true
- caveat: the real-world isn't a platonic object

General logics such as HOL and ZFC can express most formal systems, in particular, you can define programs, and reason about them

- It turns out that some programs are very much easier to reason about than others
- Pure functional programming is easiest to reason about
- Things that make reasoning harder: state, exceptions, concurrency etc
- Languages like C are effectively not feasible to reason about at the moment

- For code, mechanized verification implies that the code is bug-free in some sense
- "if the running code behaves in this way..."
- but the compiler may have bugs
- or there may be runtime bugs (e.g. out of memory, or stack space exhaustion, are not typically considered; ints are often assumed to hold infinitely large numbers etc.)
- with much effort the impact of these issues can probably be reduced or eliminated altogether; for example, if we maintain many copies of main memory, we can probably reduce random bit mutations to an arbitrarily small probability
- the real-world is extraordinarily complex

Many verifiers have the same dream

The dream is: a perfect world, or at least, a world where computers function perfectly, code never goes wrong etc.

This is unrealistic, but it provides a strong motivation: building a paradise on earth etc

Each system component must be verified. People talk about the "verified stack", consisting of verified OS, verified networking and file systems, verified compilers, verified runtimes etc.

- How is progress? Slow, but there is a sense that the pace is picking up
- verified operating system seL4
- compcert verified compiler
- both these examples involved quite a few people over an extended period of time
- some large mathematical proofs have been formalized eg 4 colour theorem

Why? Parsing is an integral part of many systems.

Example application: system security often depends on checking inputs to the system, which could be done in a general way using parsing.

So a verified parser could be a key component of a lot of verified systems.

Verified parsing consists of using a machine to prove "some theorem" about "some code" (a parser or parser generator)

The question is: what is the theorem, and what is the code?

For the code: could be a parser generator, or generated parser, or parser for a particular language, or ...

- For the theorem: many different variations are possible; some are much stronger than others; stronger statements typically require more involved proofs
- it is often hard to identify exactly what theorem has been proved, and exactly what it means; this is a bad thing.

Words such as "correct", "sound" and "complete" are used a lot, and with different meanings

- We assume that a grammar and an input give rise (mathematically) to a set of "well-formed" parse trees; the input is parseable if there is at least 1 such parse tree

A weak statement:

- For grammar G, all parse trees produced by the parser are well-formed
- easy: just make sure the parser doesn't terminate
- easy: just make sure the parser doesn't return any parse trees when it does terminate

A stronger statement(?):

- For grammar G, if the input is parseable, the parser (always terminates and) produces exactly one well-formed parse tree (soundness and a weak form of completeness).
- what does parseable mean? if it means "the parser terminates and produces exactly one well-formed parse tree" then this is just "P implies P", which is also easy
- what happens if the input is not parseable?

A strong statement:

- The parser generator, given any context-free grammar as input, (always terminates and) produces a parser that can take any input, (always terminates and) outputs a set of parse trees. This set is exactly the set of well-formed (wrt. the mathematical notion for CFGs) parse trees for the grammar and input (soundness and completeness).

- Versions of parsers and parser generators based on LR(1) and SLR and CYK and PEG parsing have had something proved about them in a theorem prover
- The theorems often have holes in them eg termination is not proved for arbitrary inputs

- (Ridge, 2011) A novel version of combinator parsers has a mechanized proof of the "strong" property from the previous slide
- completeness is with respect to all parse trees for the given CFG
- left-recursive grammars result in poor performance using these combinator parsers

- Paper here local-copy
- defn of
`grammar_to_parser`

in Fig. 3 - example parser below Fig. 3
- defn of sound in Sect. 7
- termination: all functions are terminating and total

- defn of
HOL proofs available from github, via here

OCaml code available in the P1 module from the P3 distribution; defns sufficiently short (~40 lines) that the code can be ported to your favourite functional language in a very short time

As a starting point, we should really aim for: a verified deterministic parser (that is actually useful, say, LALR(1) or LR(1)) and a verified general parser (again, that is actually useful, say, Earley)

(Ridge, 2011) is obviously great (!), but it isn't either of these.

For general parsing: CYK has been verified, but it is not really useful

What about deterministic parsers?

"TRX: A Formally Verified Parser Interpreter" is described here. This is for PEGs. Not publicly available (?)

Availability also seems to be a problem with the similar Ynot packrat PEG parser.

Also of interest is the work on verifying a parser for a C compiler by Jourdan, Pottier and Leroy. This is for an LR(1) parser generator. Distributed as part of the Compcert verified compiler distribution, but specialized to C (?)

So it seems that actually using these to parse your favourite grammar is difficult

And as mentioned, the theorems are often not what you might want

A bit disappointing

A verified parser generator for deterministic parsing (eg for LR(1)), and a verified parser for general CFGs (eg Earley; I am currently working on this)

For both, strong correctness theorems

Usable real-world implementations with the expected performance characteristics

Verified performance wrt some abstract machine

The real-world is a complex place!

Performance (theoretical, and real-world), memory consumption, user interface, debugging information, interaction with input (eg does the input string need to be completely in memory?), memory access patterns, interaction with GC, etc etc

In particular, performance (theoretical and real-world) is very important

Performance essentially the same as with traditional combinator parsing (on the grammars where traditional combinators terminate)

Can handle all grammars (good)... but left-recursive grammars are slow to parse (bad)

One of the aims of the CPP'11 work (beyond verification) was to allow the user to write a parser for whatever grammar they want, without having to worry about the implementation (no need to fiddle with the grammar etc)

But if performance is poor on some grammars, then this goal hasn't been met - users do worry about whether the grammar will result in poor performance

Combine combinator parsing interface with an Earley parser backend. This is the subject of the SLE talk.

Basic approach: allow the user to write combinator parsers as before. Extract a grammar from this, perform the parse using an Earley parser, and then use the Earley parse results to guide the action phase.

```
(* Grammar: E -> E E E | "1" | eps *)
let rec parse_E = (fun i -> mkntparser "E" (
((parse_E ***> parse_E ***> parse_E) >>>> (fun (x,(y,z)) -> `Node(x,y,z)))
|||| ((a "1") >>>> (fun _ -> `LF("1")))
|||| ((a "") >>>> (fun _ -> `LF(""))))
i)
let results = run_parser parse_E "1111111111"
```

This approach combines the flexibility of combinator parsing with the performance of Earley parsing. Easy to port to other languages.

What about the real world?

Real-world performance is hard to achieve. For example CYK is `O(n^3)`

theoretically

- if arbitrary sized arrays can be allocated in constant time
- and access to each element in the array occurs with constant time
- for input
`10^6`

bytes (~ 1MB), we may need an array of (the order of)`10^12`

bytes (~ 1000 GB); this is a lot of main memory for a relatively small input

My observations when engineering Earley for real-world performance:

- Even getting a formal proof that Earley meets the
`O(n^3)`

bound was difficult. - Arrays are necessary for
`O(n^3)`

performance, but aren't feasible for large inputs - we must accept`O(n^3 ln n)`

performance in the worst case, and smoothly transition from one bound to the other `n^3`

runtime is not acceptable for grammars which users expect to be parsed in linear time (another reason CYK is not suitable for the real world). We have to make the parser perform well on all the different subclasses. This is a difficult thing to achieve.- We are forced to deal with things that are difficult to verify: state; hashtables and memoization; arrays; mapping everything to integers etc.
- the mechanized verification presumably must avoid treating these, but there needs to be a story about why this is OK, and a clear path to incorporating these aspects

- A huge amount of engineering is involved even getting to a prototype; getting to a product that others can use takes even longer.

Compare to Happy on 5 sample small grammars

On 2, P3 is much much better

```
E_EEE: E -> E E E | "1" | ""
Grammar|Input size|Happy/s|P3/s
E_EEE |020 |0.14 |0.10
E_EEE |040 |6.87 |0.13
E_EEE |100 |2535.88|0.50
```

On 2, P3 is better, but only by a factor of 10 or so. On one, Happy appeared to loop for non-empty inputs.

See here for comparison with Happy

I think these are very good results; getting these results took an immense amount of time

This talk is really just a very brief overview of the kind of things that have been done, some things I'm working on, and the places we might want to go

Disclaimer: I am not a parsing person, I am a formal methods person, so I need enlightening!

Please come and tell me lots of interesting stuff about parsing, what the problems are, and where

think we need to go!*you*