Before:
Sports Event Approaches with No Reliable Tech Data
The council’s director of enterprise technology was tasked with overhauling the current data stack and replacing it with a modern system. Having joined the council only six months prior, he faced the challenge of executing the project within the next six months, just in time for the world’s most anticipated sports competition.
To run the event and to process revenue and reimbursements, the council needed data for on-the-ground reporting on athlete information, including demographics, medal counts and expectancy, event performance (past and current), event results, number of athletes participating, and number of countries participating. Additionally, the council needed specific data to identify if athletes were part of their program, health insurance, and/or training centers and services.
The council had already implemented Snowflake, Azure Data Factory, and Tableau, but couldn’t execute data modeling, data analytics, and dashboard building effectively, experiencing data quality challenges and jeopardizing trust in the data. Moreover, the burden of identifying issues and requesting rework fell on the business users, wasting staff time.
Solution:
Modernizing Data Infrastructure: Revamping a Data Stack Through Strategic Auditing and Implementation
The council reached out to Spaulding Ridge based on the recommendations of a previous client. They needed help bridging the gap between their current technology and their required data output.
Instead of diving head-first into solving these problems, we conducted a technology audit. This approach helped establish transparency and trust with the client and identified key areas which posed the greatest risk to a functioning data stack that matched the business requirements. Armed with this knowledge, we planned goals and allocated resources, setting ourselves up to track progress throughout the implementation and ultimately realize value faster.
Through stakeholder interviews, system audits, documentation reviews, and code inspections, we determined the necessary steps to complete their modern data stack.
The audit of the initial stack revealed that dashboard users were encountering issues with inaccurate, incomplete, and duplicate data. They were continually identifying incorrect data and requesting rework, which undermined the goal of task automation and eroded trust. This was due to lack of a foundational enterprise modeling layer and adherence to modeling best practices.
To address this, we structured their data warehouse to have three environments for raw, QA/staging, and analytics and implemented dbt best practices to set up a foundational enterprise modeling layer. To make sure we captured the key business processes and needs, we held whiteboarding sessions with the council to determine the models needed for reporting during the upcoming event.
From there, we established an interim data staging area between the data source and data warehouse. This staging area allowed for data cleaning before it was transferred to the data warehouse, preventing data duplication. In order to not affect current work processes and to test the new code, we first staged our updates in the staging/QA layer before introducing it to production. We also utilized row tables to store information efficiently and avoid the need for deduplication processes.
To maintain the speed of data flow, we implemented a process where only selected rows of information requested from the source were captured and sent to the destination. Additionally, we mandated that structured data from the source be transferred directly to the warehouse. For unstructured data, we structured it before storing it in the data warehouse.