OpenTelemetry In Golang

Summary

I was recently working on a project that involved VueJS, Golang(Go) and Mongo. For the API layer in Go, it was time to instrument it with metrics, logs and traces. I was using Gin due to its ease of setup and ability to handle json data.

Parts of the instrumentation were easy. For example traces worked out of the box with the otelgin middlware. Metrics had some examples going around but needed some work and logs were a pain.

The Beauty of OpenTelemetry(OTEL) is that you can instrument your application with it and it does not matter where you send the telemetry on the back end, most of the big name brands support OTLP directly.

Go + Gin + Middleware

Go has the concept of middleware in its web frameworks which make it really easy to monitor or adjust a request in flight. Gin is no exception. Gin by default has two middlewares it applies. They are gin.Logger() & gin.Recovery(). Logger implements a simple logger to the console. Recovery recovers from any panics and returns a 5xx error.

The otelgin middleware above simply takes the context of the http request and with a properly setup OpenTelemetry tracer and internal propagation of context, it will export to your tracing tool that supports OpenTelemetry.

Initializing and Using OTEL Tracing

Initializing the tracer is pretty simple but rather lengthy.

I have a “func InitTracer() func(context.Context) error” function that handles this. For those not terribly familiar with Go, this is a function that returns another function with context that returns an error.

func InitTracer() func(context.Context) error {
	//TODO: Only do cleanup if we're using OTLP
	if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") == "" {
		return func(ctx context.Context) error {
			//log.Print("nil cleanup function - success if this is without OTEL!")
			return nil
		}
	}
	exporter, err := otlptrace.New(
		context.Background(),
		otlptracegrpc.NewClient(),
	)

	if err != nil {
		panic(err)
	}

	resources, err := resource.New(
		context.Background(),
		resource.WithAttributes(
			attribute.String("library.language", "go"),
		),
	)
	if err != nil {
		//log.Print("Could not set resources: ", err)
	}

	otel.SetTracerProvider(
		tracesdk.NewTracerProvider(
			tracesdk.WithSampler(tracesdk.AlwaysSample()),
			tracesdk.WithBatcher(exporter),
			tracesdk.WithResource(resources),
		),
	)
	// Baggage may submit too much sensitive data for production
		  otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))

	return exporter.Shutdown
}

The actual usage of this in func main() might look something like this

tracerCleanup := InitTracer()
	//TODO: I don't think this defer ever runs
	defer tracerCleanup(context.Background())

If you use multiple packages, the way this is initialized, it will persist as its configured as the global tracer for the instance.

From there its just a matter of using the middleware from the otelgin package

router.Use(otelgin.Middleware(os.Getenv("OTEL_SERVICE_NAME")))

That is really it. It mostly works out of the box.

Initializing and Using OTEL Metrics

Metrics was a little more difficult. I couldn’t find a suitable example online so I ended up writing my own. It initializes the same way calling

meterCleanup := otelmetricsgin.InitMeter()
defer meterCleanup(context.Background())

router.Use(otelmetricsgin.Middleware())

You want this to be higher up on the usage of middleware because we’re starting a timer to capture latency.

Key Notes About My otelginmetrics

The first thing to do is it is the quickest and dirtiest quick and dirty middleware I could possibly put together. There are much better and eloquent ways of doing it but I needed something to work.

It exports two metrics. One is http_server_requests_total. This is the total number of requests. The other is http_server_request_duration_seconds which is the duration in seconds of each request. The http_server_request_duration_seconds is a histogram with quite a few tags to be able to split by HTTP Method, HTTP Status Code, URI and hostname of the node serving the HTTP.

Prometheus style histograms are out of scope for this article but perhaps another. In short they are time series metrics that are slotted into buckets. In our case we’re slotting them into buckets of response time. Because the default OTEL buckets are poor for latency in seconds (which should almost always be less than 1, I opted to adjust the buckets on this metric to 0.005, 0.01, 0.05, 0.5, 1, 5.

Initializing and Using OTEL Logs

Both of the metrics and traces API for Go for OTEL are considered stable. Logs, however are beta and it shows. It was a bit more complicated to get through but it is possible!

The first one is the default log provider in Go does not have any middleware that supports. As of Go 1.21 slog or Structured Logging became available and uses json format to output rich logging. OTEL doesn’t let you call the logging API directly. It provides what it calls bridges so other providers can call it. For this I used the otelslog api bridge. It initializes similarly.

func InitLog() func(context.Context) error {

	//TODO: Only do cleanup if we're using OTLP
	if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") == "" {
		return func(ctx context.Context) error {
			//log.Print("nil cleanup function - success if this is without OTEL!")
			return nil
		}
	}
	ctx := context.Background()

	exporter, err := otlploggrpc.New(ctx)
	if err != nil {
		panic("failed to initialize exporter")
	}

	// Create the logger provider
	lp := log.NewLoggerProvider(
		log.WithProcessor(
			log.NewBatchProcessor(exporter),
		),
	)

	global.SetLoggerProvider(lp)

	return lp.Shutdown
}

And then usage

logger := otelslog.NewLogger(os.Getenv("OTEL_SERVICE_NAME"))

router.Use(sloggin.NewWithConfig(logger, config))

// Health Checks will spam logs, we don't need this
filter := sloggin.IgnorePath("/")

config := sloggin.Config{
	WithRequestID: true,
	Filters:       []sloggin.Filter{filter},
}

router.Use(sloggin.NewWithConfig(logger, config))

From here, we could use the sloggin middleware for Gin to instrument logging on every request with request and response information. An example might look something like this.

Datalog Log & Trace Correlation

In the above screenshot you see an otel.trace_id and otel.span_id. Unfortunately, DataDog cannot use this directly so it needs a conversion and to use dd.trace_id and dd.span_id. We needed to override the logger to somehow inject this. That expertise was way beyond my skill set but I did find someone that could do it and had documented it on their blog. The code did not compile as is and required some adjusting along with DD’s conversion.

To save people some trouble I published my updated version.

To use it we would import it as a different namespace to avoid conflict

import (
   newslogin "github.com/dchapman992000/otelslog"
)

func main() {
    ....
    // This was our first slog logger
    logger := otelslog.NewLogger(os.Getenv("OTEL_SERVICE_NAME"))
    //This is the new one where we inject our new one into it using the embedded structs and promotions in Go
    logger = newslogin.InitialiseLogging(logger.Handler())
}

You can then see in the screenshot pulling up the logs, we have the ability to see the related traces and it all works!

Parsing CSV Files with Golang

Summary

I love coming across problems that require me to learn something new. I have written a few posts about Go (or Golang) such as Using Recursion on Golang to Solve a Problem and Unit Testing Golang on App Engine. Go is one of my favorite utility languages.

Earlier this year, my wife decided to start her own hair salon and do booth rental (Pretty Hare Salon). We set up square for appointment setting and credit cards. It has nice reporting features but there is literally no forecasting. Forecasting helps us determine pricing and when to increase or decrease marketing/advertisement.

Parameters of the Problem

To work around this lack of forecasting, I found an export feature that exports appointments in csv. Unfortunately it does not list the pricing associated with it. At the time, I implemented something in perl but decided to rewrite in Golang. The application inputs two CSV files (1 – appointments and 2 – pricing) and calculates weekly totals based on that.

Helper Function

Error checking and reporting requirements are fairly minimal for this so I use a helper function that is fairly basic.

func CheckErr(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

Loading CSVs

In order to load the CSV into a variable, I opted to use os.Open() which takes the file path. I pass these in through command-line arguments using os.Args. os.Open returns a pointer to a file.

args := os.Args

	appointmentsFile := args[1]
	pricesFiles := args[2]

	af, err := os.Open(appointmentsFile)
	CheckErr(err)

	pf, err := os.Open(pricesFiles)
	CheckErr(err)

I then open the files return by these with cvs.NewReader

appointments := csv.NewReader(af)
prices := csv.NewReader(pf)

NewReader() expects an io.Reader but *os.File implements a reader so we are good.

These readers return a 2 dimensional slice with the left most dimension being the line number and second dimension being the field.

For the price records I read it in entirety. It is not very long.

priceRecords, err = prices.ReadAll()
CheckErr(err)

Iterating the CSV Line by Line

The Appointment file is the full history and we do not need all the data in it. It can get very large over time so it is best not to completely load it into memory. We also will have logic to ignore anything older than 2 weeks as this is a forecast and we do not care about historicals.

for {
	appointment, err := appointments.Read()
	if err == io.EOF {
		break
	}
	// Business logic in here
}

The appointment array in this can then be read as a single dimension. Something similar to what is below exists in the for loop previously shown.

if appointment[2] == "accepted" {
	// More business logic here
}

Looking Up Data

The pricing information which we fully loaded into the two dimensional slice can be iterated as follows

for _, price := range priceRecords {
	if price[0] == service {
		return strconv.ParseFloat(price[8][1:], 64)
	}
}

return 0, errors.New("Could not find service - " + service)

The “service” variable is the service we are looking up. In the price slice, the first or 0 position is the actual service name we are looking up and position 8 is the actual price.

The price field (8) has a dollar sign so I just used string slicing to omit the first character which is the dollar sign so I could parse the float like this

price[8][1:]

The pricing file is highly controlled so I have very little error checking here.

Final Words

In this article we discussed opening and iterating through CSV files using two methods. One involves fully loading the file into memory and the other involves iterating line by line. At this point you should have a good idea of which method to use and when.

Using Recursion on Golang To Solve A Problem!

Summary

I was writing an SSL Certificate Decoder ( https://tools.woohoosvcs.com/certdecoder/ ) and came across an interesting problem. I wanted to validate a certificate chain but had some difficulty figuring it out. The problem was that I needed to find the most child certificate in a seemingly random list of certificates. What I am calling the most child certificate is the one at the bottom of the chain that is issued but does not issue one of its own. Users can provide the cert chain in any order so we cannot relay on the order they are submitted for this tool.

SSL Certificate Chains

To understand the problem, one needs to understand an SSL Certificate Chain. When traversing a chain, each certificate is an individual certificate. How they are linked is by their issuer. There is a subject section about the certificate itself and then an issuer section which indicates the certificate authority certificate that issued that. It goes all the way up until you hit a root certificate. The root certificates sign themselves.

Here you can see the cert chain for this site. The SAN or alternate names that allow tools.woohoosvcs.com be valid are further down below this screenshot.

SSL Certificate Chain
SSL Certificate Chain

Recursion

It had been a few decades since I sat in a Computer Science class but I do remember a few control structures. One of which was recursion. Recursion can be a difficult concept to grasp and an even more difficult concept to implement. Golang has an example of this here. At a high level a recursive function is one that calls itself. The call stack gets nested in each layer. In the case that you do not know how many certificates and iterations you need to go through to find the chain, recursive functions help with this. Alternatively you would have to use multiple layers of “for” loops to iterate through, matching the current cert against one that may be a child. In a past life I may have implemented a few layers of “for” loops to statically account for the most common scenarios.

Recursion can be tricky and if you do not use it right, it can involve stack overflows if the function indefinitely calls it self. The key to a good recursive function is a way out. It needs to be able to exit without continuing to call itself forever. There are also limits based on the platform as to how deep you can go in this recursion. Your platform will have a limit at which point an exception or error is thrown.

For a much better description of recursion you can read the Wikipedia article here – https://en.wikipedia.org/wiki/Recursion_(computer_science)

The Code

It is still a work in progress but after iterations of playing with linked lists, multiple for loops, this is what I landed on.

func findChildOfCert(certs []x509.Certificate, cert x509.Certificate) x509.Certificate {
	if len(certs) <= 1 {
		return cert
	}
	result := cert
	for _, item := range certs {
		if item.Issuer.CommonName == cert.Subject.CommonName {
			return findChildOfCert(certs, item)
		}
	}

	return result
}

It is kicked off in a parent function via

childCert := findChildOfCert(cs, cs[0])

Where cs is a slice (Golang speak for array) of certificates much like “certs” in the recursive function. We pass it the list of certificates and the very first one.

On the first call it checks to see if this certificate issued any others in the list. If so, it calls the function again with the issued certificate( more child than the current ) and does the process over again.

When it cannot find a certificate issued by the currently iterated certificate (most child record), the for loop exits and it simply passes the original cert that the function was called with. At that point, the stack completely unwinds, passing the “answer” certificate all the way down. That result is provided to the childCert variable.

Validating the Cert

Golang provides a few options for validating the cert. Once you have the most child certificate, you can do something like the below.

for i := range cs {
	if !cs[i].Equal(&childCert) {
		roots.AddCert(&cs[i])
	}
}

opts := x509.VerifyOptions{
	Intermediates: roots,
}

opts2 := x509.VerifyOptions{
	Roots: roots,
}

if _, err := childCert.Verify(opts); err != nil {
	status = append(status, "Not Trusted By Root - failed to verify certificate: "+err.Error())
} else {
	status = append(status, "Trusted Chain")
}

if _, err := childCert.Verify(opts2); err != nil {
	status = append(status, "Not Valid(contiguous) - failed to verify certificate: "+err.Error())
} else {
	status = append(status, "Valid Chain")
}

I load up a “roots” slice of the roots provided. I also exclude the child certificate from this. From there I perform two validations. One that the chain is trusted, meaning it rolls up to one that is trusted by the source used. The other validation is that the chain is valid. Is there continuity in the chain or is it broken. A chain can be valid but un trusted. Knowing the difference may help you in a rare case.

Stack Overflow

I actually found a stack overflow doing a regression test with a self signed certificate. The code above actually ended up comparing the certificate to itself over and over again and trying to keep going down the rabit hole. It ended up with the following

runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x1332651, 0xe)
	/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1046 +0x6e9
runtime.morestack()
	/usr/local/go/src/runtime/asm_amd64.s:449 +0x8f

goroutine 24 [running]:
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:180 +0x1d0 fp=0xc020114ef8 sp=0xc020114ef0 pc=0x128a210
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:184 +0x1a3 fp=0xc020116920 sp=0xc020114ef8 pc=0x128a1e3
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:184 +0x1a3 fp=0xc020118348 sp=0xc020116920 pc=0x128a1e3

Luckily my unit testing caught this and this would have never gone to production due to that unit testing. If you’re not sure what unit testing is check out my article Unit Testing Golang on App Engine. The fix was simple though in that to make sure my recursive function doesn’t compare a cert to itself using the .IsEquals() function

Final Words

In this we walked through a useful use case for recursion and the background of the technology that needed it. I even provided some of the errors that happen when you fail to use recursion properly as I accidentally did!