Introducing tracing into existing metrics and logging infrastructure

[Disclaimer: I work for a app monitoring and mapping company Netsil]
[Sorry for a long answer but your question is very important and relevant for many folks.]
If you are interested in capturing the code level insights (which function, what thread, what function parameters, etc.), then natural answer is to do something in the code e.g. tracing or even try an APM byte-code instrumentation (aka New Relic, AppDyanmics, or open source such as Pinpoint ). And I agree with Charity, adding tracing to old code is difficult and having done it once (on OpenStack!) it is very iterative process (you inadvertently miss some code that should have been traced).
There are relatively less laborious options if you look towards Operating System level tracers (a.k.a eBPF or kernel modules such as from sysdig) or if you look at network packet capture analysis such as Netsil (my employer company.) My knowledge of OS tracers is limited but I believe it will at least get you the information around process id + socket + src/dest IP&ports. So there you have ingredients to create a “less informative” span. What Netsil (my company) does is it takes sample of packets and analyzes it to get the Application level information. So without much of code change and using a combination of OS tracer + Network Analysis you can get very informative traces or as we call it Application Maps. (we did a more elaborate comparison of these techniques here:
Hope this was useful. Love to hear your thoughts.

Hi there @ArvindSoni! I have moved your post to its own thread, since the AMA thread was intended for Charity to answer all the questions, but we feel that your contribution should have a thread of its own. Please come again to share more of your best practices and ideas with our community!

1 Like