June 19, 2014

Perf and Stack Traces

I was wondering why perf record -g don’t show proper stack traces for my programs in production environment. First I thought that kernel was too old, but after performing few experiments I have found out that it wasn’t the case. Problem was that when you compile with optimizations (-O3), gcc automatically omits frame pointers. And it is not easy to unwind stack traces without frame pointers. But gcc can do it somehow. So I have continued digging and stumbled upon article where people bash one of the perf authors for not being able to unwind stacks without frame pointers.

If you read thoroughly you can find out that gcc uses DWARF debugging information for stack unwinding, but it’s too slow for profilers.

So I wanted to know how much slower would my programs be if I include frame pointers. And seems that performance losses are negligible:

I tested two MySQL builds, one built with ‘-O3 -g -fno-omit-frame-pointer’ and other with -fomit-frame-pointer instead – and performance difference was negligible. It was around 1% in sysbench tests, and slightly over 3% at tight-loop select benchmark(100000000,(select asin(5+5)+sin(5+5))); on a 2-cpu Opteron box.

So I will try including frame pointers from now on. To include them, use -fno-omit-frame-pointers gcc parameter when building executable.

If you are interested what frame pointers are and how they work, I would recommend reading answer to this stackoverflow question.