Performance & Profiling
Profiling with pprof
- Query the pprof cpu endpoint on the node host:
- CPU:
curl -X GET localhost:6060/debug/pprof/profile?seconds=<number> > <filename>
- Heap:
curl -X GET localhost:6060/debug/pprof/heap?seconds=<number> > <filename>
- can query from your local machine by substituting localhost with the IP of the node, depending on your network setup. By doing this, can skip step 2.
- CPU:
- If querying on the node host, SCP the file to yourself:
scp <filename> <user>@<host>:<path>
- E.g.
scp <filename> [email protected]:/home/roman/osmosis/pprof
- ensure that your ISP or firewall is not blocking the file transfer
- E.g.
- Run a web server and open up a browser
go tool pprof -http=localhost:8080 <filename>
graphviz
must be installed
Memory
Causes
The following cause memory issues in Go – Creating substrings and subslices. – Wrong use of the defer statement. – Unclosed HTTP response bodies (or unclosed resources in general). – Orphaned hanging go routines. – Global variables.
Interpreting Output
– inuse_space
: Means pprof is showing the amount of memory allocated
and not yet released.
– inuse_objects
: Means pprof is showing the amount of objects allocated
and not yet released.
– alloc_space
: Means pprof is showing the amount of memory allocated,
regardless if it was released or not.
– alloc_objects
: Means pprof is showing the amount of objects allocated,
regardless if they were released or not.
– flat
: Represents the memory allocated by a function and still held by that
function.
– cum
: Represents the memory allocated by a function or any other function
that is called down the stack.
Useful links
- Pprof Doc
- Graphviz Download
- Using SCP
- Advanced Go Profiling Talk (YouTube)
- Notes from the talk above
- Memory Leaking Scenarios
- Great blogpost about profiling heap
Benchmarking
Best practices
- Running the benchmarks on an idle machine not running on battery
- Use
-benchmem
to also get stats on allocated objects and space - Use
benchstat
to compare performance across different git branches - Adding -run='$^' or -run=- to each go test command to avoid running the tests too
Benchstat sample output for illustration:
name old time/op new time/op delta
Decode-4 2.20s ± 0% 1.54s ± 0% ~ (p=1.000 n=1+1)
For benchstat specifically:
- Using higher -count values if the benchmark numbers aren't stable
- if you don't, your sample size would be too small and
delta
might not be reported (like in example above) because it is not significant enough. - if you do, might take longer since you need multiple runs to get a good sample size
- people recommend 5 as a good enough sample size
- if you don't, your sample size would be too small and
Adding -run='$^' or -run=- to each go test command to avoid running the tests too
Example
Let's assume that we are working on branch osmosis/string
and added some performance improvements to tree.String()
.
As a result, we would like to bench test like in the following in iavl.
To get a nice bench summary we would follow these steps:
- Checkout the
master
branch and get the output of the benchmark:
git checkout master
go test -benchmem -run=^$ -bench ^BenchmarkTreeString$ -benchmem -count 5 github.com/cosmos/iavl > bench_string_old.txt
- Checkout our
osmosis/string
branch and get the output of the benchmark:
git checkout master
go test -benchmem -run=^$ -bench ^BenchmarkTreeString$ -benchmem -count 5 github.com/cosmos/iavl > bench_string_new.txt
- Compare the two outputs with
benchstat
:
benchstat bench_string_old.txt bench_string_new.txt
- Evaluate the output and attach to your PR, if needed