Parsing CSV Files with Golang

Summary

I love coming across problems that require me to learn something new. I have written a few posts about Go (or Golang) such as Using Recursion on Golang to Solve a Problem and Unit Testing Golang on App Engine. Go is one of my favorite utility languages.

Earlier this year, my wife decided to start her own hair salon and do booth rental (Pretty Hare Salon). We set up square for appointment setting and credit cards. It has nice reporting features but there is literally no forecasting. Forecasting helps us determine pricing and when to increase or decrease marketing/advertisement.

Parameters of the Problem

To work around this lack of forecasting, I found an export feature that exports appointments in csv. Unfortunately it does not list the pricing associated with it. At the time, I implemented something in perl but decided to rewrite in Golang. The application inputs two CSV files (1 – appointments and 2 – pricing) and calculates weekly totals based on that.

Helper Function

Error checking and reporting requirements are fairly minimal for this so I use a helper function that is fairly basic.

func CheckErr(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

Loading CSVs

In order to load the CSV into a variable, I opted to use os.Open() which takes the file path. I pass these in through command-line arguments using os.Args. os.Open returns a pointer to a file.

args := os.Args

	appointmentsFile := args[1]
	pricesFiles := args[2]

	af, err := os.Open(appointmentsFile)
	CheckErr(err)

	pf, err := os.Open(pricesFiles)
	CheckErr(err)

I then open the files return by these with cvs.NewReader

appointments := csv.NewReader(af)
prices := csv.NewReader(pf)

NewReader() expects an io.Reader but *os.File implements a reader so we are good.

These readers return a 2 dimensional slice with the left most dimension being the line number and second dimension being the field.

For the price records I read it in entirety. It is not very long.

priceRecords, err = prices.ReadAll()
CheckErr(err)

Iterating the CSV Line by Line

The Appointment file is the full history and we do not need all the data in it. It can get very large over time so it is best not to completely load it into memory. We also will have logic to ignore anything older than 2 weeks as this is a forecast and we do not care about historicals.

for {
	appointment, err := appointments.Read()
	if err == io.EOF {
		break
	}
	// Business logic in here
}

The appointment array in this can then be read as a single dimension. Something similar to what is below exists in the for loop previously shown.

if appointment[2] == "accepted" {
	// More business logic here
}

Looking Up Data

The pricing information which we fully loaded into the two dimensional slice can be iterated as follows

for _, price := range priceRecords {
	if price[0] == service {
		return strconv.ParseFloat(price[8][1:], 64)
	}
}

return 0, errors.New("Could not find service - " + service)

The “service” variable is the service we are looking up. In the price slice, the first or 0 position is the actual service name we are looking up and position 8 is the actual price.

The price field (8) has a dollar sign so I just used string slicing to omit the first character which is the dollar sign so I could parse the float like this

price[8][1:]

The pricing file is highly controlled so I have very little error checking here.

Final Words

In this article we discussed opening and iterating through CSV files using two methods. One involves fully loading the file into memory and the other involves iterating line by line. At this point you should have a good idea of which method to use and when.

Using Recursion on Golang To Solve A Problem!

Summary

I was writing an SSL Certificate Decoder ( https://tools.woohoosvcs.com/certdecoder/ ) and came across an interesting problem. I wanted to validate a certificate chain but had some difficulty figuring it out. The problem was that I needed to find the most child certificate in a seemingly random list of certificates. What I am calling the most child certificate is the one at the bottom of the chain that is issued but does not issue one of its own. Users can provide the cert chain in any order so we cannot relay on the order they are submitted for this tool.

SSL Certificate Chains

To understand the problem, one needs to understand an SSL Certificate Chain. When traversing a chain, each certificate is an individual certificate. How they are linked is by their issuer. There is a subject section about the certificate itself and then an issuer section which indicates the certificate authority certificate that issued that. It goes all the way up until you hit a root certificate. The root certificates sign themselves.

Here you can see the cert chain for this site. The SAN or alternate names that allow tools.woohoosvcs.com be valid are further down below this screenshot.

SSL Certificate Chain
SSL Certificate Chain

Recursion

It had been a few decades since I sat in a Computer Science class but I do remember a few control structures. One of which was recursion. Recursion can be a difficult concept to grasp and an even more difficult concept to implement. Golang has an example of this here. At a high level a recursive function is one that calls itself. The call stack gets nested in each layer. In the case that you do not know how many certificates and iterations you need to go through to find the chain, recursive functions help with this. Alternatively you would have to use multiple layers of “for” loops to iterate through, matching the current cert against one that may be a child. In a past life I may have implemented a few layers of “for” loops to statically account for the most common scenarios.

Recursion can be tricky and if you do not use it right, it can involve stack overflows if the function indefinitely calls it self. The key to a good recursive function is a way out. It needs to be able to exit without continuing to call itself forever. There are also limits based on the platform as to how deep you can go in this recursion. Your platform will have a limit at which point an exception or error is thrown.

For a much better description of recursion you can read the Wikipedia article here – https://en.wikipedia.org/wiki/Recursion_(computer_science)

The Code

It is still a work in progress but after iterations of playing with linked lists, multiple for loops, this is what I landed on.

func findChildOfCert(certs []x509.Certificate, cert x509.Certificate) x509.Certificate {
	if len(certs) <= 1 {
		return cert
	}
	result := cert
	for _, item := range certs {
		if item.Issuer.CommonName == cert.Subject.CommonName {
			return findChildOfCert(certs, item)
		}
	}

	return result
}

It is kicked off in a parent function via

childCert := findChildOfCert(cs, cs[0])

Where cs is a slice (Golang speak for array) of certificates much like “certs” in the recursive function. We pass it the list of certificates and the very first one.

On the first call it checks to see if this certificate issued any others in the list. If so, it calls the function again with the issued certificate( more child than the current ) and does the process over again.

When it cannot find a certificate issued by the currently iterated certificate (most child record), the for loop exits and it simply passes the original cert that the function was called with. At that point, the stack completely unwinds, passing the “answer” certificate all the way down. That result is provided to the childCert variable.

Validating the Cert

Golang provides a few options for validating the cert. Once you have the most child certificate, you can do something like the below.

for i := range cs {
	if !cs[i].Equal(&childCert) {
		roots.AddCert(&cs[i])
	}
}

opts := x509.VerifyOptions{
	Intermediates: roots,
}

opts2 := x509.VerifyOptions{
	Roots: roots,
}

if _, err := childCert.Verify(opts); err != nil {
	status = append(status, "Not Trusted By Root - failed to verify certificate: "+err.Error())
} else {
	status = append(status, "Trusted Chain")
}

if _, err := childCert.Verify(opts2); err != nil {
	status = append(status, "Not Valid(contiguous) - failed to verify certificate: "+err.Error())
} else {
	status = append(status, "Valid Chain")
}

I load up a “roots” slice of the roots provided. I also exclude the child certificate from this. From there I perform two validations. One that the chain is trusted, meaning it rolls up to one that is trusted by the source used. The other validation is that the chain is valid. Is there continuity in the chain or is it broken. A chain can be valid but un trusted. Knowing the difference may help you in a rare case.

Stack Overflow

I actually found a stack overflow doing a regression test with a self signed certificate. The code above actually ended up comparing the certificate to itself over and over again and trying to keep going down the rabit hole. It ended up with the following

runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x1332651, 0xe)
	/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1046 +0x6e9
runtime.morestack()
	/usr/local/go/src/runtime/asm_amd64.s:449 +0x8f

goroutine 24 [running]:
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:180 +0x1d0 fp=0xc020114ef8 sp=0xc020114ef0 pc=0x128a210
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:184 +0x1a3 fp=0xc020116920 sp=0xc020114ef8 pc=0x128a1e3
tools.woohoosvcs.com/certdecoder.findChildOfCert(0xc0000d6b00, 0x1, 0x1, 0xc0000b8800, 0x3da, 0x3ea, 0xc0000b8804, 0x2c2, 0x3e6, 0xc0000b89a0, ...)

tools.woohoosvcs.com/certdecoder/certdecoder.go:184 +0x1a3 fp=0xc020118348 sp=0xc020116920 pc=0x128a1e3

Luckily my unit testing caught this and this would have never gone to production due to that unit testing. If you’re not sure what unit testing is check out my article Unit Testing Golang on App Engine. The fix was simple though in that to make sure my recursive function doesn’t compare a cert to itself using the .IsEquals() function

Final Words

In this we walked through a useful use case for recursion and the background of the technology that needed it. I even provided some of the errors that happen when you fail to use recursion properly as I accidentally did!