ccgo/v4 experiment: Trying the new runtime.Pinner

tl;dr: Looking forward future Pinner.Pin performance improvements.

The upcoming Go version 1.21, scheduled for release next month, is currently available for download as Go 1.21rc2 in the "Unstable version" section here. Go 1.21 introduces a new runtime type, Pinner.

ccgo/v4, the next, also not yet released version of the C to Go transpiler, uses pinning to "freeze" addresses of local  Go variables, addresses of which are passed around in the original C code. ccgo produces Go code where any C pointer points to memory not managed by the Go runtime. So ccgo simply puts such "escaping" variables in the memory not visible to the garbage collector, with stable, immovable addresses. Those are provided by the modernc.org/memory package. Otherwise a goroutine stack resizing can change the address of a local variable.

Another problem ccgo has to solve are the runtime pointer validity checks. I'm not aware of the details being documented somewhere outside of the runtime source code, but long ago I learned, for example, the value of a function pointer SQLITE_TRANSIENT, when stored in an unsafe.Pointer, is being caught, rejected and the program gets aborted. Hence ccgo transpiles all C pointers to Go uintptr type. That avoids the pointer check problem, but the cost is the necessity to produce worse looking Go code when accessing the "pinned" local variables. Let's have some C code:

    int f(int *p) {
        return *p;
    }

    int main() {
        int i = 42;
        // f(&i);
    }

Using ccgo/v4@master [currently commit 9d0f7450] and issuing $ ccgo main.c -o main.go , the transpilation is:

    func f(tls *libc.TLS, p uintptr) (r int32) {
        return *(*int32)(unsafe.Pointer(p))
    }

    func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
        var i int32
        i = int32(42)
        _ = i
        // f(&i);
        return r
    }

Doing the same, while uncommenting the call to f in the C code, produces this version of main:

    func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
        bp := tls.Alloc(8) /* tlsAllocs 8 maxValist 0 */
        defer tls.Free(8)
        var _ /* i at bp+0 */ int32
        *(*int32)(unsafe.Pointer(bp)) = int32(42)
        f(tls, bp)
        return r
    }

The mechanism works. It has obviously some additional runtime cost, but we will not discuss that part here today. I wanted to try the new Pinner, in hope to eventually producing code like:

    var pinner runtime.Pinner
    var i int32
    ip := &i
    pinner.Pin(ip)
    defer pinner.Unpin
    i = int32(42)

Looking at the source code of (*Pinner).Pin, I was worried about its performance. The code is complex, naturally, it has to coordinate with a concurrent garbage collector and that's not a simple task.

The easy way to evaluate the performance impact, if any, is not to pin individual variables, just update the bp mechanism to not use tls.Alloc/Free and instead pin the same memory acquired from the Go runtime. That does not improve the ugliness of the bp-relative code, but after all, that's less important than runtime costs and we want to just evaluate exactly that as the initial step of the investigation. Issuing $ ccgo -experiment-pin=1 main.c -o main.go, the result is now:

    func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
        var pinner runtime.Pinner
        bpp := &[1]int64{}
        pinner.Pin(bpp)
        defer pinner.Unpin()
        bp := uintptr(unsafe.Pointer(&bpp[0]))
        var _ /* i at bp+0 */ int32
        *(*int32)(unsafe.Pointer(bp)) = int32(42)
        f(tls, bp)
        return r
    }

The good news are, all previously passing ccgo/v4/lib tests still pass with the added -experiment-pin=1 option on the linux/amd64 target. And here are some comparative numbers for realistically complex code:

jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go version
go version go1.21rc2 linux/amd64
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ uname -a
Linux 3900x 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go test -v -run TestSQLite/speedtest1
=== RUN   TestSQLite
=== RUN   TestSQLite/speedtest1
execute /usr/local/go/bin/go ["mod" "init" "example.com/ccgo/v4/lib/sqlite"] in /tmp/ccgo-test-1916976164
go: creating new go.mod: module example.com/ccgo/v4/lib/sqlite
execute /usr/local/go/bin/go ["get" "modernc.org/libc"] in /tmp/ccgo-test-1916976164
go: added github.com/dustin/go-humanize v1.0.1
go: added github.com/google/uuid v1.3.0
go: added github.com/mattn/go-isatty v0.0.16
go: added github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec
go: added golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab
go: added modernc.org/libc v1.24.1
go: added modernc.org/mathutil v1.5.0
go: added modernc.org/memory v1.6.0
    all_test.go:1312: 
        -- Speedtest1 for SQLite 3.42.0 2023-05-16 12:36:15 831d0fb2836b71c9bc51067c49fe
         100 - 50000 INSERTs into table with no index......................    0.083s
         110 - 50000 ordered INSERTS with one index/PK.....................    0.112s
         120 - 50000 unordered INSERTS with one index/PK...................    0.131s
         130 - 25 SELECTS, numeric BETWEEN, unindexed......................    0.084s
         140 - 10 SELECTS, LIKE, unindexed.................................    0.202s
         142 - 10 SELECTS w/ORDER BY, unindexed............................    0.255s
         145 - 10 SELECTS w/ORDER BY and LIMIT, unindexed..................    0.177s
         150 - CREATE INDEX five times.....................................    0.168s
         160 - 10000 SELECTS, numeric BETWEEN, indexed.....................    0.079s
         161 - 10000 SELECTS, numeric BETWEEN, PK..........................    0.078s
         170 - 10000 SELECTS, text BETWEEN, indexed........................    0.175s
         180 - 50000 INSERTS with three indexes............................    0.194s
         190 - DELETE and REFILL one table.................................    0.198s
         200 - VACUUM......................................................    0.170s
         210 - ALTER TABLE ADD COLUMN, and query...........................    0.005s
         230 - 10000 UPDATES, numeric BETWEEN, indexed.....................    0.091s
         240 - 50000 UPDATES of individual rows............................    0.138s
         250 - One big UPDATE of the whole 50000-row table.................    0.023s
         260 - Query added column after filling............................    0.004s
         270 - 10000 DELETEs, numeric BETWEEN, indexed.....................    0.259s
         280 - 50000 DELETEs of individual rows............................    0.175s
         290 - Refill two 50000-row tables using REPLACE...................    0.428s
         300 - Refill a 50000-row table using (b&1)==(a&1).................    0.192s
         310 - 10000 four-ways joins.......................................    0.388s
         320 - subquery in result set......................................    0.517s
         400 - 70000 REPLACE ops on an IPK.................................    0.123s
         410 - 70000 SELECTS on an IPK.....................................    0.075s
         500 - 70000 REPLACE on TEXT PK....................................    0.146s
         510 - 70000 SELECTS on a TEXT PK..................................    0.149s
         520 - 70000 SELECT DISTINCT.......................................    0.081s
         980 - PRAGMA integrity_check......................................    0.413s
         990 - ANALYZE.....................................................    0.044s
               TOTAL.......................................................    5.357s
--- PASS: TestSQLite (25.25s)
    --- PASS: TestSQLite/speedtest1 (25.25s)
PASS
ok          modernc.org/ccgo/v4/lib 25.280s
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go test -v -run TestSQLite/speedtest1 -pin=1
=== RUN   TestSQLite
=== RUN   TestSQLite/speedtest1
execute /usr/local/go/bin/go ["mod" "init" "example.com/ccgo/v4/lib/sqlite"] in /tmp/ccgo-test-2951855829
go: creating new go.mod: module example.com/ccgo/v4/lib/sqlite
execute /usr/local/go/bin/go ["get" "modernc.org/libc"] in /tmp/ccgo-test-2951855829
go: added github.com/dustin/go-humanize v1.0.1
go: added github.com/google/uuid v1.3.0
go: added github.com/mattn/go-isatty v0.0.16
go: added github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec
go: added golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab
go: added modernc.org/libc v1.24.1
go: added modernc.org/mathutil v1.5.0
go: added modernc.org/memory v1.6.0
    all_test.go:1312: 
        -- Speedtest1 for SQLite 3.42.0 2023-05-16 12:36:15 831d0fb2836b71c9bc51067c49fe
         100 - 50000 INSERTs into table with no index......................    0.561s
         110 - 50000 ordered INSERTS with one index/PK.....................    0.799s
         120 - 50000 unordered INSERTS with one index/PK...................    0.822s
         130 - 25 SELECTS, numeric BETWEEN, unindexed......................    0.199s
         140 - 10 SELECTS, LIKE, unindexed.................................    0.898s
         142 - 10 SELECTS w/ORDER BY, unindexed............................    1.365s
         145 - 10 SELECTS w/ORDER BY and LIMIT, unindexed..................    1.072s
         150 - CREATE INDEX five times.....................................    1.302s
         160 - 10000 SELECTS, numeric BETWEEN, indexed.....................    0.542s
         161 - 10000 SELECTS, numeric BETWEEN, PK..........................    0.560s
         170 - 10000 SELECTS, text BETWEEN, indexed........................    0.920s
         180 - 50000 INSERTS with three indexes............................    0.807s
         190 - DELETE and REFILL one table.................................    0.808s
         200 - VACUUM......................................................    1.352s
         210 - ALTER TABLE ADD COLUMN, and query...........................    0.010s
         230 - 10000 UPDATES, numeric BETWEEN, indexed.....................    0.593s
         240 - 50000 UPDATES of individual rows............................    0.543s
         250 - One big UPDATE of the whole 50000-row table.................    0.046s
         260 - Query added column after filling............................    0.009s
         270 - 10000 DELETEs, numeric BETWEEN, indexed.....................    0.878s
         280 - 50000 DELETEs of individual rows............................    0.634s
         290 - Refill two 50000-row tables using REPLACE...................    2.016s
         300 - Refill a 50000-row table using (b&1)==(a&1).................    0.858s
         310 - 10000 four-ways joins.......................................    1.932s
         320 - subquery in result set......................................    1.121s
         400 - 70000 REPLACE ops on an IPK.................................    0.888s
         410 - 70000 SELECTS on an IPK.....................................    0.516s
         500 - 70000 REPLACE on TEXT PK....................................    0.979s
         510 - 70000 SELECTS on a TEXT PK..................................    1.089s
         520 - 70000 SELECT DISTINCT.......................................    0.745s
         980 - PRAGMA integrity_check......................................    2.645s
         990 - ANALYZE.....................................................    0.066s
               TOTAL.......................................................   27.575s
--- PASS: TestSQLite (47.63s)
    --- PASS: TestSQLite/speedtest1 (47.63s)
PASS
ok          modernc.org/ccgo/v4/lib 47.661s
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ 

----

If you want to try and reproduce the numbers on your machine, you need to use Go1.21rc2 and the target needs to be linux/{386,amd64,arm,ppc64le,s390x}. The ccgo/v4 SQLite tests at tip do not pass elsewhere yet.

Comments

Popular posts from this blog

Producing a Go parser from the language specification mechanically, mostly

Producing a Go scanner in 1,219 bytes of code