Summary
I was recently working on a project that involved VueJS, Golang(Go) and Mongo. For the API layer in Go, it was time to instrument it with metrics, logs and traces. I was using Gin due to its ease of setup and ability to handle json data.
Parts of the instrumentation were easy. For example traces worked out of the box with the otelgin middlware. Metrics had some examples going around but needed some work and logs were a pain.
The Beauty of OpenTelemetry(OTEL) is that you can instrument your application with it and it does not matter where you send the telemetry on the back end, most of the big name brands support OTLP directly.
Go + Gin + Middleware
Go has the concept of middleware in its web frameworks which make it really easy to monitor or adjust a request in flight. Gin is no exception. Gin by default has two middlewares it applies. They are gin.Logger() & gin.Recovery(). Logger implements a simple logger to the console. Recovery recovers from any panics and returns a 5xx error.
The otelgin middleware above simply takes the context of the http request and with a properly setup OpenTelemetry tracer and internal propagation of context, it will export to your tracing tool that supports OpenTelemetry.
Initializing and Using OTEL Tracing
Initializing the tracer is pretty simple but rather lengthy.
I have a “func InitTracer() func(context.Context) error” function that handles this. For those not terribly familiar with Go, this is a function that returns another function with context that returns an error.
func InitTracer() func(context.Context) error {
//TODO: Only do cleanup if we're using OTLP
if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") == "" {
return func(ctx context.Context) error {
//log.Print("nil cleanup function - success if this is without OTEL!")
return nil
}
}
exporter, err := otlptrace.New(
context.Background(),
otlptracegrpc.NewClient(),
)
if err != nil {
panic(err)
}
resources, err := resource.New(
context.Background(),
resource.WithAttributes(
attribute.String("library.language", "go"),
),
)
if err != nil {
//log.Print("Could not set resources: ", err)
}
otel.SetTracerProvider(
tracesdk.NewTracerProvider(
tracesdk.WithSampler(tracesdk.AlwaysSample()),
tracesdk.WithBatcher(exporter),
tracesdk.WithResource(resources),
),
)
// Baggage may submit too much sensitive data for production
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
return exporter.Shutdown
}
The actual usage of this in func main() might look something like this
tracerCleanup := InitTracer()
//TODO: I don't think this defer ever runs
defer tracerCleanup(context.Background())
If you use multiple packages, the way this is initialized, it will persist as its configured as the global tracer for the instance.
From there its just a matter of using the middleware from the otelgin package
router.Use(otelgin.Middleware(os.Getenv("OTEL_SERVICE_NAME")))
That is really it. It mostly works out of the box.
Initializing and Using OTEL Metrics
Metrics was a little more difficult. I couldn’t find a suitable example online so I ended up writing my own. It initializes the same way calling
meterCleanup := otelmetricsgin.InitMeter()
defer meterCleanup(context.Background())
router.Use(otelmetricsgin.Middleware())
You want this to be higher up on the usage of middleware because we’re starting a timer to capture latency.
Key Notes About My otelginmetrics
The first thing to do is it is the quickest and dirtiest quick and dirty middleware I could possibly put together. There are much better and eloquent ways of doing it but I needed something to work.
It exports two metrics. One is http_server_requests_total. This is the total number of requests. The other is http_server_request_duration_seconds which is the duration in seconds of each request. The http_server_request_duration_seconds is a histogram with quite a few tags to be able to split by HTTP Method, HTTP Status Code, URI and hostname of the node serving the HTTP.
Prometheus style histograms are out of scope for this article but perhaps another. In short they are time series metrics that are slotted into buckets. In our case we’re slotting them into buckets of response time. Because the default OTEL buckets are poor for latency in seconds (which should almost always be less than 1, I opted to adjust the buckets on this metric to 0.005, 0.01, 0.05, 0.5, 1, 5.
Initializing and Using OTEL Logs
Both of the metrics and traces API for Go for OTEL are considered stable. Logs, however are beta and it shows. It was a bit more complicated to get through but it is possible!
The first one is the default log provider in Go does not have any middleware that supports. As of Go 1.21 slog or Structured Logging became available and uses json format to output rich logging. OTEL doesn’t let you call the logging API directly. It provides what it calls bridges so other providers can call it. For this I used the otelslog api bridge. It initializes similarly.
func InitLog() func(context.Context) error {
//TODO: Only do cleanup if we're using OTLP
if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") == "" {
return func(ctx context.Context) error {
//log.Print("nil cleanup function - success if this is without OTEL!")
return nil
}
}
ctx := context.Background()
exporter, err := otlploggrpc.New(ctx)
if err != nil {
panic("failed to initialize exporter")
}
// Create the logger provider
lp := log.NewLoggerProvider(
log.WithProcessor(
log.NewBatchProcessor(exporter),
),
)
global.SetLoggerProvider(lp)
return lp.Shutdown
}
And then usage
logger := otelslog.NewLogger(os.Getenv("OTEL_SERVICE_NAME"))
router.Use(sloggin.NewWithConfig(logger, config))
// Health Checks will spam logs, we don't need this
filter := sloggin.IgnorePath("/")
config := sloggin.Config{
WithRequestID: true,
Filters: []sloggin.Filter{filter},
}
router.Use(sloggin.NewWithConfig(logger, config))
From here, we could use the sloggin middleware for Gin to instrument logging on every request with request and response information. An example might look something like this.
Datalog Log & Trace Correlation
In the above screenshot you see an otel.trace_id and otel.span_id. Unfortunately, DataDog cannot use this directly so it needs a conversion and to use dd.trace_id and dd.span_id. We needed to override the logger to somehow inject this. That expertise was way beyond my skill set but I did find someone that could do it and had documented it on their blog. The code did not compile as is and required some adjusting along with DD’s conversion.
To save people some trouble I published my updated version.
To use it we would import it as a different namespace to avoid conflict
import (
newslogin "github.com/dchapman992000/otelslog"
)
func main() {
....
// This was our first slog logger
logger := otelslog.NewLogger(os.Getenv("OTEL_SERVICE_NAME"))
//This is the new one where we inject our new one into it using the embedded structs and promotions in Go
logger = newslogin.InitialiseLogging(logger.Handler())
}
You can then see in the screenshot pulling up the logs, we have the ability to see the related traces and it all works!