Modern C

Posts

Producing a Go scanner in 1,219 bytes of code

- May 14, 2023

modernc.org/rec is a regexp to Go code compiler tool. It is still a bit rough around the edges. For example, it converts the regexps to a DFA, but it does not yet support intersecting character classes ending up in the same DFA state. Anyway, rec can already handle some nontrivial tasks, like generating a usable, working Go scanner. Here are those 1,219 bytes - in scanner.sh . The shell script is used in the generate target of the Makefile . Note the Perl Unicode character classes in the regexp for Go indentifiers. The respective EBNF lexical grammar part of Go specification is identifier = letter { letter | unicode_digit } . The above production expands and translates to the regexp (\pL|_)(\pL|_|\p{Nd})* . As mentioned above, constructing DFAs from regexps using character classes is a bit challenging per se when considering Unicode. A similar program, lx(1) , part of libfsm , seems to not support Unicode so far, possibly because facing similar difficulties. The resulting sca

Producing a Go parser from the language specification mechanically, mostly

- November 25, 2022

There's yet another, possibly never-to-be-completed Go front end, modernc.org/gc/v3 . This time I'm trying something new with respect to the Go parser. It takes three main steps, initially. 1. Extract the EBNF specification from the language specs . The unmodified EBNF grammar is not a well-formed PEG grammar : $ go test -v -run Spec === RUN TestSpecEBNF all_test.go:68: left recursive: Expression all_test.go:68: left recursive: PrimaryExpr --- PASS: TestSpecEBNF (0.01s) PASS ok modernc.org/gc/v3/internal/ebnf 0.011s $ 2. Manually rewrite spec.ebnf to peg.ebnf The goals are to Remove the left recursion Reorder the terms to obtain the correct parse tree. Rewrite selected parts of the grammar to get the backtracing on a large corpus of Go code to something like, say acceptable* 10% on average. (* - acceptable as a starting point) For the last part a PEG, actually in this case an EBNF interpreter is needed . To clarify, a particular PEG grammar can be used to g

Search This Blog

Modern C

Posts

ccgo/v4 experiment: Trying the new runtime.Pinner

Producing a Go scanner in 1,219 bytes of code

Producing a Go parser from the language specification mechanically, mostly