Posts

Image
Taming the Flat AST: Ergonomics in the Age of Zero Allocations Last week, I introduced egg , a modern LL(1) parser generator for Go. The response from the community was insightful, particularly regarding egg 's most distinct feature: the flat AST. To achieve high performance and low garbage collection pressure, egg encodes the entire Abstract Syntax Tree (AST) into a single []int32 slice. While this is excellent for the CPU cache, it can be intimidating for the developer. One commenter on the Gophers Slack put it perfectly: "The 'Traversing the AST' section really dissuades me from using this tool... I think coming up with a clean, well-typed API might be harder than I think." They were right. In my previous post, I showed a raw recursive walker that involved manual index arithmetic. It was fast, but it wasn't pretty. However, shortly after that post, I implemented ImportEBNF —a feature allowing egg to bootstrap itself by reading EBNF from ...

Building a High-Performance JSON Parser in Go with Egg

Image
Egg: A Modern LL(1) Parser Generator for Go Parsing is a staple of computer science, yet finding the right tool for the job in Go often feels like a choice between heavy, complex frameworks or writing a brittle parser by hand. Enter egg (Expression Grammar Generator). egg is a new parser generator that prioritizes simplicity, performance, and Go-native conventions. It takes an EBNF grammar—very similar to the one used in the Go language specification—and generates a fast, dependency-light recursive descent parser. In this post, we’ll build a fully functional JSON parser from scratch to demonstrate how egg works, and we'll dive deep into its most controversial and powerful feature: its flat AST encoding. The Grammar: Familiar and Concise One of egg 's strengths is that it doesn't invent a complex new syntax. If you've read the Go language spec, you already know how to write an egg grammar. It combines lexical definitions (regex and literals) and syntax de...

ccgo/v4 experiment: Trying the new runtime.Pinner

tl;dr: Looking forward future Pinner.Pin  performance improvements. The upcoming Go version 1.21, scheduled for release next month, is currently available for download as Go 1.21rc2 in the "Unstable version" section here . Go 1.21 introduces a new runtime type, Pinner . ccgo/v4, the next, also not yet released version of the C to Go transpiler, uses pinning to "freeze" addresses of local  Go variables, addresses of which are passed around in the original C code. ccgo produces Go code where any C pointer points to memory not managed by the Go runtime. So ccgo simply puts such "escaping" variables in the memory not visible to the garbage collector, with stable, immovable addresses. Those are provided by the modernc.org/memory package. Otherwise a goroutine stack resizing can change the address of a local variable. Another problem ccgo has to solve are the runtime pointer validity checks. I'm not aware of the details being documented somewhere outside of...

Producing a Go scanner in 1,219 bytes of code

modernc.org/rec is a regexp to Go code compiler tool. It is still a bit rough around the edges. For example, it converts the regexps to a DFA, but it does not yet support intersecting character classes ending up in the same DFA state. Anyway, rec can already handle some nontrivial tasks, like generating a usable, working Go scanner. Here are those 1,219 bytes - in scanner.sh . The shell script is used in the generate target of the Makefile . Note the Perl Unicode character classes in the regexp for Go indentifiers. The respective EBNF  lexical grammar   part of Go specification is identifier = letter { letter | unicode_digit } . The above production expands and translates to the regexp (\pL|_)(\pL|_|\p{Nd})* .  As mentioned above, constructing DFAs from regexps using character classes is a bit challenging per se when considering Unicode. A similar program, lx(1) , part of libfsm , seems to not support Unicode so far, possibly because facing similar difficulties. The...

Producing a Go parser from the language specification mechanically, mostly

There's yet another, possibly never-to-be-completed Go front end, modernc.org/gc/v3 . This time I'm trying something new with respect to the Go parser. It takes three main steps, initially. 1. Extract the EBNF specification from the language specs . The unmodified EBNF grammar is not a well-formed PEG grammar : $ go test -v -run Spec === RUN   TestSpecEBNF     all_test.go:68: left recursive: Expression     all_test.go:68: left recursive: PrimaryExpr --- PASS: TestSpecEBNF (0.01s) PASS ok  modernc.org/gc/v3/internal/ebnf 0.011s $  2. Manually rewrite spec.ebnf to peg.ebnf The goals are to Remove the left recursion Reorder the terms to obtain the correct parse tree. Rewrite selected parts of the grammar to get the backtracing on a large corpus of Go code to something like, say acceptable* 10% on average. (* - acceptable as a starting point) For the last part a PEG, actually in this case an  EBNF interpreter  is needed . To clarify, ...