tl;dr: Looking forward future Pinner.Pin performance improvements.
The upcoming Go version 1.21, scheduled for release next month, is currently available for download as Go 1.21rc2 in the "Unstable version" section here. Go 1.21 introduces a new runtime type, Pinner.
ccgo/v4, the next, also not yet released version of the C to Go transpiler, uses pinning to "freeze" addresses of local Go variables, addresses of which are passed around in the original C code. ccgo produces Go code where any C pointer points to memory not managed by the Go runtime. So ccgo simply puts such "escaping" variables in the memory not visible to the garbage collector, with stable, immovable addresses. Those are provided by the modernc.org/memory package. Otherwise a goroutine stack resizing can change the address of a local variable.
Another problem ccgo has to solve are the runtime pointer validity checks. I'm not aware of the details being documented somewhere outside of the runtime source code, but long ago I learned, for example, the value of a function pointer SQLITE_TRANSIENT, when stored in an unsafe.Pointer, is being caught, rejected and the program gets aborted. Hence ccgo transpiles all C pointers to Go uintptr type. That avoids the pointer check problem, but the cost is the necessity to produce worse looking Go code when accessing the "pinned" local variables. Let's have some C code:
int f(int *p) {
return *p;
}
int main() {
int i = 42;
// f(&i);
}
Using
ccgo/v4@master [currently commit
9d0f7450] and issuing
$ ccgo main.c -o main.go , the transpilation is:
func f(tls *libc.TLS, p uintptr) (r int32) {
return *(*int32)(unsafe.Pointer(p))
}
func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
var i int32
i = int32(42)
_ = i
// f(&i);
return r
}
Doing the same, while uncommenting the call to f in the C code, produces this version of main:
func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
bp := tls.Alloc(8) /* tlsAllocs 8 maxValist 0 */
defer tls.Free(8)
var _ /* i at bp+0 */ int32
*(*int32)(unsafe.Pointer(bp)) = int32(42)
f(tls, bp)
return r
}
The mechanism works. It has obviously some additional runtime cost, but we will not discuss that part here today. I wanted to try the new Pinner, in hope to eventually producing code like:
var pinner runtime.Pinner
var i int32
ip := &i
pinner.Pin(ip)
defer pinner.Unpin
i = int32(42)
Looking at the source code of (*Pinner).Pin, I was worried about its performance. The code is complex, naturally, it has to coordinate with a concurrent garbage collector and that's not a simple task.
The easy way to evaluate the performance impact, if any, is not to pin individual variables, just update the bp mechanism to not use tls.Alloc/Free and instead pin the same memory acquired from the Go runtime. That does not improve the ugliness of the bp-relative code, but after all, that's less important than runtime costs and we want to just evaluate exactly that as the initial step of the investigation. Issuing $ ccgo -experiment-pin=1 main.c -o main.go, the result is now:
func main1(tls *libc.TLS, argc int32, argv uintptr) (r int32) {
var pinner runtime.Pinner
bpp := &[1]int64{}
pinner.Pin(bpp)
defer pinner.Unpin()
bp := uintptr(unsafe.Pointer(&bpp[0]))
var _ /* i at bp+0 */ int32
*(*int32)(unsafe.Pointer(bp)) = int32(42)
f(tls, bp)
return r
}
The good news are, all previously passing ccgo/v4/lib tests still pass with the added -experiment-pin=1 option on the linux/amd64 target. And here are some comparative numbers for realistically complex code:
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go version
go version go1.21rc2 linux/amd64
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ uname -a
Linux 3900x 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go test -v -run TestSQLite/speedtest1
=== RUN TestSQLite
=== RUN TestSQLite/speedtest1
execute /usr/local/go/bin/go ["mod" "init" "example.com/ccgo/v4/lib/sqlite"] in /tmp/ccgo-test-1916976164
go: creating new go.mod: module example.com/ccgo/v4/lib/sqlite
execute /usr/local/go/bin/go ["get" "modernc.org/libc"] in /tmp/ccgo-test-1916976164
go: added github.com/dustin/go-humanize v1.0.1
go: added github.com/google/uuid v1.3.0
go: added github.com/mattn/go-isatty v0.0.16
go: added github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec
go: added golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab
go: added modernc.org/libc v1.24.1
go: added modernc.org/mathutil v1.5.0
go: added modernc.org/memory v1.6.0
all_test.go:1312:
-- Speedtest1 for SQLite 3.42.0 2023-05-16 12:36:15 831d0fb2836b71c9bc51067c49fe
100 - 50000 INSERTs into table with no index...................... 0.083s
110 - 50000 ordered INSERTS with one index/PK..................... 0.112s
120 - 50000 unordered INSERTS with one index/PK................... 0.131s
130 - 25 SELECTS, numeric BETWEEN, unindexed...................... 0.084s
140 - 10 SELECTS, LIKE, unindexed................................. 0.202s
142 - 10 SELECTS w/ORDER BY, unindexed............................ 0.255s
145 - 10 SELECTS w/ORDER BY and LIMIT, unindexed.................. 0.177s
150 - CREATE INDEX five times..................................... 0.168s
160 - 10000 SELECTS, numeric BETWEEN, indexed..................... 0.079s
161 - 10000 SELECTS, numeric BETWEEN, PK.......................... 0.078s
170 - 10000 SELECTS, text BETWEEN, indexed........................ 0.175s
180 - 50000 INSERTS with three indexes............................ 0.194s
190 - DELETE and REFILL one table................................. 0.198s
200 - VACUUM...................................................... 0.170s
210 - ALTER TABLE ADD COLUMN, and query........................... 0.005s
230 - 10000 UPDATES, numeric BETWEEN, indexed..................... 0.091s
240 - 50000 UPDATES of individual rows............................ 0.138s
250 - One big UPDATE of the whole 50000-row table................. 0.023s
260 - Query added column after filling............................ 0.004s
270 - 10000 DELETEs, numeric BETWEEN, indexed..................... 0.259s
280 - 50000 DELETEs of individual rows............................ 0.175s
290 - Refill two 50000-row tables using REPLACE................... 0.428s
300 - Refill a 50000-row table using (b&1)==(a&1)................. 0.192s
310 - 10000 four-ways joins....................................... 0.388s
320 - subquery in result set...................................... 0.517s
400 - 70000 REPLACE ops on an IPK................................. 0.123s
410 - 70000 SELECTS on an IPK..................................... 0.075s
500 - 70000 REPLACE on TEXT PK.................................... 0.146s
510 - 70000 SELECTS on a TEXT PK.................................. 0.149s
520 - 70000 SELECT DISTINCT....................................... 0.081s
980 - PRAGMA integrity_check...................................... 0.413s
990 - ANALYZE..................................................... 0.044s
TOTAL....................................................... 5.357s
--- PASS: TestSQLite (25.25s)
--- PASS: TestSQLite/speedtest1 (25.25s)
PASS
ok modernc.org/ccgo/v4/lib 25.280s
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$ go test -v -run TestSQLite/speedtest1 -pin=1
=== RUN TestSQLite
=== RUN TestSQLite/speedtest1
execute /usr/local/go/bin/go ["mod" "init" "example.com/ccgo/v4/lib/sqlite"] in /tmp/ccgo-test-2951855829
go: creating new go.mod: module example.com/ccgo/v4/lib/sqlite
execute /usr/local/go/bin/go ["get" "modernc.org/libc"] in /tmp/ccgo-test-2951855829
go: added github.com/dustin/go-humanize v1.0.1
go: added github.com/google/uuid v1.3.0
go: added github.com/mattn/go-isatty v0.0.16
go: added github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec
go: added golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab
go: added modernc.org/libc v1.24.1
go: added modernc.org/mathutil v1.5.0
go: added modernc.org/memory v1.6.0
all_test.go:1312:
-- Speedtest1 for SQLite 3.42.0 2023-05-16 12:36:15 831d0fb2836b71c9bc51067c49fe
100 - 50000 INSERTs into table with no index...................... 0.561s
110 - 50000 ordered INSERTS with one index/PK..................... 0.799s
120 - 50000 unordered INSERTS with one index/PK................... 0.822s
130 - 25 SELECTS, numeric BETWEEN, unindexed...................... 0.199s
140 - 10 SELECTS, LIKE, unindexed................................. 0.898s
142 - 10 SELECTS w/ORDER BY, unindexed............................ 1.365s
145 - 10 SELECTS w/ORDER BY and LIMIT, unindexed.................. 1.072s
150 - CREATE INDEX five times..................................... 1.302s
160 - 10000 SELECTS, numeric BETWEEN, indexed..................... 0.542s
161 - 10000 SELECTS, numeric BETWEEN, PK.......................... 0.560s
170 - 10000 SELECTS, text BETWEEN, indexed........................ 0.920s
180 - 50000 INSERTS with three indexes............................ 0.807s
190 - DELETE and REFILL one table................................. 0.808s
200 - VACUUM...................................................... 1.352s
210 - ALTER TABLE ADD COLUMN, and query........................... 0.010s
230 - 10000 UPDATES, numeric BETWEEN, indexed..................... 0.593s
240 - 50000 UPDATES of individual rows............................ 0.543s
250 - One big UPDATE of the whole 50000-row table................. 0.046s
260 - Query added column after filling............................ 0.009s
270 - 10000 DELETEs, numeric BETWEEN, indexed..................... 0.878s
280 - 50000 DELETEs of individual rows............................ 0.634s
290 - Refill two 50000-row tables using REPLACE................... 2.016s
300 - Refill a 50000-row table using (b&1)==(a&1)................. 0.858s
310 - 10000 four-ways joins....................................... 1.932s
320 - subquery in result set...................................... 1.121s
400 - 70000 REPLACE ops on an IPK................................. 0.888s
410 - 70000 SELECTS on an IPK..................................... 0.516s
500 - 70000 REPLACE on TEXT PK.................................... 0.979s
510 - 70000 SELECTS on a TEXT PK.................................. 1.089s
520 - 70000 SELECT DISTINCT....................................... 0.745s
980 - PRAGMA integrity_check...................................... 2.645s
990 - ANALYZE..................................................... 0.066s
TOTAL....................................................... 27.575s
--- PASS: TestSQLite (47.63s)
--- PASS: TestSQLite/speedtest1 (47.63s)
PASS
ok modernc.org/ccgo/v4/lib 47.661s
jnml@3900x:~/src/modernc.org/ccgo/v4/lib$
----
If you want to try and reproduce the numbers on your machine, you need to use Go1.21rc2 and the target needs to be linux/{386,amd64,arm,ppc64le,s390x}. The ccgo/v4 SQLite tests at tip do not pass elsewhere yet.
Comments
Post a Comment