Bridging the Communication Gap Within Your Agile Data Team: BDD Testing for PySpark

One of the biggest things agile software teams struggle with is poor communication between engineers and domain experts. Embracing behavioural-driven development helps to bridge this gap by getting both sides collaborating on test specifications which everyone can read and even write.

BDD is becoming increasingly popular among software teams for this very reason, and we've seen this at Plexure, with teams across the business embracing it and reaping the rewards. Testing APIs is significantly different from testing data pipelines, and there's not much out there on 'doing BDD' with the latter. Not wanting to be left out, however, we thought we'd give it a go in our Data Engineering team anyway.

In this lightning talk, I'll talk about how we got on, show some of our tests and maybe even convince you to consider writing BDD tests for your own PySpark transformations.

Home