Data Validation in a Data Hub

Data Validation in a Data Hub

Author: Aaron Overfors
Your Anaplan Data Hub is the lynchpin for your entire application’s effectiveness, so it’s critical that the data is accurate, up-to-date, and appropriately structured. This way the data is in a readily-consumable format and gives your users confidence they’re working with the correct data.The first time you’re able to see the full set of data in one place is often when loading data into your data hub. Data may be coming from multiple source systems that don’t synchronize, or you may have been unable to gain a relevant glimpse into your data due to the format or sheer size of it.  With all this data now in one place and the creation of a new, full data set, it’s essential to confirm that the data is correct.There are two broad categories of data accuracy in this environment: internal consistency and external consistency. The below strategies are targeted to optimize both.
  1. Setup imports that show only successes (green checks) when run correctly. Structure data imports in a way that they will only return unqualified successes (green check marks) when run as intended (i.e. contain no material errors). This allows for ready attention to imports that return any errors or failures, which can be inspected manually, via Anaplan Connect or your ETL Tool. Typically, the majority of data discrepancies are revealed via this process.
  2. Check for broken hierarchies. One of the quickest ways to identify an issue is when you encounter a broken hierarchy, meaning there are either orphaned items (items without a parent) or incomplete hierarchies. A simple way to check for this is to create a line item in a module which applies to the list (one line item per list) that is Boolean-formatted and has the formula, ISBLANK(PARENT(ITEM(List Name))).
  3. Apply list formatting wherever possible to hone in on discrepancies. One of the most powerful features of Anaplan is its ability to quickly deliver information on massive sets of data by applying a list structure, and can be of significant assistance when validating text-formatted data. In addition to validating the hierarchy in the above step, this allows for checking internal consistency by highlighting text that may not be aligned with the standard structure/nomenclature of your data. There are a few ways to do this, but the most straightforward way is to create a list-formatted line item that assesses the text field (e.g. FINDITEM(List Name, Text Line Item)), then filters the line item to return blanks, indicating an issue.
  4. Develop an Admin Reporting Dashboard that shows relevant summary of data. You can use the module from #2 and setup a saved view of the Boolean line items filtered to ‘check’ to quickly see if there are any hierarchy issues, and do the same for #3 to verify internal consistency. Additionally, it can be helpful to view metadata or summary totals of amount fields (e.g. revenue, count of products, number of employees, etc.) on a single dashboard so that you can verify the data accuracy without combing through low-level data.
Checking every last data point you load into your data hub is impractical and may not be feasible, but with these strategies, you can greatly reduce the likelihood of errors in your data.What other approaches have you used to validate your data?
Social Media Auto Publish Powered By : XYZScripts.com