Stay out of trouble
A few key insights on staying out of trouble with your data team and data architecture.
This article is a work in progress.
It is currently being discussed in a thread on LinkedIn.
Read the accompanying article: Carlsson's non-patented framework for a failing data department
Start with operational data
There are generally two types of business value a data team can deliver: leadership information and operational data.
-
Leadership information refers to traditional business intelligence and analytics that help C-level executives, stakeholders, and management understand how the business is performing.
-
Operational data is used to run day-to-day business operations. This includes reports like listing new leads, tracking marketing costs, or generating financial reports that Finance updates with each new booking.
It’s tempting to start with leadership information, especially since the people who need it are often the main advocates for building a data team. This is usually the worst possible approach.
In my experience, the only way to ensure high data quality is to have business owners regularly review data at the most granular level. If your data is directly used for handling incoming leads, marketing spend, or finance reports, you’ll receive immediate feedback whenever there’s even the slightest discrepancy.
It is common for reporting to be correct when you create it, but over time, it may become inaccurate due to changes in source data or unforeseen edge cases. These kinds of issues are not always possible to catch in tests, but they are often noticed by operational people who review the data daily.
This is incredibly valuable and makes the transition from operational data to leadership information straightforward—it’s mostly a matter of adding some UI sugar and a few group by
s.
However, going in the opposite direction—from leadership information to operational data—is extremely difficult. There are several pitfalls:
- You will create enemies within the organization. If your reports present numbers that operational teams disagree with and you can’t show how the data was calculated, they won’t trust your work and will see you as either incompetent or as someone who works against them.
- Your data will never be fully accurate. Without business stakeholders reviewing data line by line, errors will slip through. Fixing these errors later is costly—not just in credibility but also politically and legally, especially if incorrect numbers were shared with investors or regulators.
- You won’t know how much the data is off. Without a detailed operational foundation, you have no reliable way to assess data quality, making it impossible to detect or correct errors effectively.
Delete often
Over time, your development speed slows down, you spend increasingly more time maintaining rather than developing, and your stakeholders find your reporting increasingly confusing and difficult to navigate.
This stems from a buildup of unused legacy.
One of the most important tasks—and perhaps the most underutilized—is deleting things that are no longer used.
In some organizations, this can be difficult, especially if you have a product owner or stakeholders who control the backlog and focus only on new features.
However, this is where leadership comes in. If you don’t delete often, your data platform will become less valuable, less trusted, and new features will be a lot harder to implement.
A key part of a data team’s process should be ensuring that unused assets are regularly identified and removed.
Most work should be temporary
This closely aligns with the previous section, "Delete often."
One way to keep unused legacy out of your system is to designate most new features as temporary.
It also ensures that you don’t overengineer features—if you know they are temporary, you won’t be tempted to put unnecessary effort into them.
Another benefit is that you can get data into the hands of stakeholders faster.
For example, say your marketing team wants to start using Bing Ads.
You could either set it up "correctly" from the start—with automated ingestion, modeling, integration into existing reporting, adjusting conformed dimensions, and training stakeholders.
Or, you could simply do a quick ad hoc CSV import into an Excel workbook.
Maybe marketing decides against Bing Ads, and you’ve saved a lot of unnecessary work and added complexity. Or, if they move forward, both you and the marketing team will have a much clearer understanding of what’s needed—making the final implementation significantly easier and better.
To ensure that a temporary solution doesn’t become a permanent one, it is crucial that the data team has one leader who can make decisions on both architecture and task prioritization.
Only add what is needed
Q: When is it easiest to add a new dimension, an extra metric, or all source columns to a staging model?
A: When you’re already working on it.
But that’s a trap!
As discussed above, the more unnecessary elements you introduce, the slower your development becomes, the more time you spend on maintenance, and the more confusing your reporting becomes for stakeholders.
Only add what you know the business will need.