My principles for shipping a data pipeline

Inspired by this HBR article

In mid-2015 I began working on an ETL and Data Warehouse for Bloc after identifying several internal symptoms. I learned a ton about not just the technological challenges of such a project but also the organizational hurdles to a successful data pipeline.

This was an immensely satisfying project for me because it was an early, influential technical project that had a sizable impact on the company’s future trajectory. Keep in mind that Bloc probably had ~16 full time employees (4 of which were engineers) when we decided to work on this.

Here’s my (extremely distilled) version of what was required to make this a successful core competency (and dare I say competitive advantage) of the company. Much of this can be applied to any cross-functional technical project, of course.

Support from the top. The very nature of a data-pipeline project is that it must cross team boundaries. I worked with people from our sales, marketing, operations, and product teams. These folks were supportive, and I believe I communicated clearly throughout the process, but it was clear that my role as CTO was seriously influencing my ability to move fast and get the project shipped. And, because I was on the executive team, I was aware of each of the other team’s goals, and authentically verify that I was working on The Right Things.

Clear objectives. I am a firm believer in OKRs. Without a set of measurable milestones, scope-creep is exceptionally easy for this type of project.  I was working within the Kimball framework, which let me tightly scope which deliverables were most important to different stakeholders.

Disciplined separation of concerns. I can’t tell you how frequently I relied upon the “Kitchen vs. Dining Room” analogy. It was essential. Our data warehouse was, simply speaking, an exercise in refactoring our reporting infrastructure to be loosely coupled with the rest of the technical aspects of the company instead of tightly coupled. And, because it served many people across the company (and because I was only one person), I couldn’t reasonably satisfy everyone’s demands in a timely fashion. When driven people can’t get what they want through you, they’ll try to find it some other way. No, I won’t have your report ready this week. Yes, you can have read-only credentials to the application database to run your own queries. No, you can’t interrupt an engineer to ask about our data model. Yes, your reports will break. This took me awhile to learn, I’ll admit. You are allowed to observe the cooks in the kitchen, but you may only actively participate in the dining room (with a menu curated by the cooks). I am the server. It was a great communication tool that let me push back while still being a team player.

Clear communication. Almost all of the technical aspects of the project were on the backend. This means most of the time I spent was spent on code that nobody outside the engineering team could see or quickly grok. Proactively communicating status, being honest when things slipped and proud when they didn’t was important because people began to trust me. Over-communicating how each deliverable was related to the clear objectives left very little room for misunderstanding or disagreement.

There are many technical principles I also learned while working on this project, but I believe these organizational hurdles likely impede most projects (at companies of similar size) more than any technical challenge.

Got any others to add? I’d love to hear from you!