1. Data Validation: Function vs Process
Earlier, we discussed data validation as a spreadsheet feature (e.g., drop-down lists, restricting input).
However, data validation is also a broader analytical process.
Spreadsheet Function
- Controls what users can enter.
- Prevents incorrect input.
Data Validation Process
- Verifies data accuracy.
- Confirms logical consistency.
- Ensures completeness and security.
- Aligns data with business rules.
- Detects anomalies before analysis continues.
The function is a tool.
The process is a mindset.
2. Core Goals of Data Validation
Data validation ensures that data is:
- Complete – No missing critical fields.
- Accurate – Values reflect reality.
- Consistent – Formats and logic align.
- Secure – Protected from unintended changes.
- Aligned with business logic – Makes sense within context.
Validation reduces the risk of analytical errors.
3. Example 1: Logical Recalculation Check
Scenario
A furniture retailer dataset includes:
- Quantity sold
- Product price
- Purchase price
Business rule:
Purchase Price = Quantity × Product Price
Validation Step
Add a recalculation column:
=Quantity * Product_Price
Compare this with the stored Purchase Price column.
Finding a Discrepancy
One value does not match.
Through investigation, we discover:
- A 30% discount applies when customers buy 5 or more items.
Without validation:
- We might incorrectly assume data corruption.
- Or worse, base analysis on incorrect assumptions.
Validation uncovered embedded business logic.
4. Example 2: Common-Sense Business Validation
Scenario
Analyzing promotions for a business open only on weekdays.
Validation question:
Are there sales on weekends?
If weekend sales exist:
- Could indicate data entry errors.
- Or special events.
- Or exceptions to normal operations.
Validation does not always mean error correction.
It often means context clarification.
5. Example 3: SQL Validation
In the avocado dataset:
Validation query:
SELECT
Small_Bags + Large_Bags + XLarge_Bags AS calculated_total,
Total_Bags
FROM avocado_data.avocado_prices;Purpose:
- Confirm Total_Bags column is accurate.
Result:
- Values match.
- Data integrity confirmed.
6. Error Detection During Validation
While calculating percentage:
Small_Bags / Total_BagsError:
Division by zero
Meaning:
- Some rows have Total_Bags = 0.
Fix:
- Add WHERE condition
- Use SAFE_DIVIDE
Validation prevented:
- Broken dashboards
- Stakeholder-facing errors
- Misleading output
7. Types of Validation Checks
7.1 Arithmetic Validation
- Recalculate totals.
- Confirm sums match components.
7.2 Boundary Checks
- No negative quantities.
- No impossible values.
- No out-of-range dates.
7.3 Business Rule Validation
- Discounts applied correctly.
- Promotions active on correct dates.
- Region-specific pricing rules.
7.4 Logical Consistency Checks
- Weekend sales vs operating schedule.
- Revenue consistency across fields.
- Matching foreign keys in joins.
8. Validation as Continuous Process
Validation should occur:
- During data ingestion
- During cleaning
- During transformation
- During calculation
- Before final presentation
It is not a one-time step.
9. Why Data Validation Is Critical
Without validation:
- Incorrect calculations propagate.
- Small errors become large reporting problems.
- Stakeholder trust decreases.
- Decisions may be flawed.
With validation:
- Confidence increases.
- Business logic is understood.
- Errors are caught early.
- Presentations remain professional.
10. Analytical Mindset
Data validation requires:
- Curiosity
- Business understanding
- Questioning anomalies
- Logical reasoning
- Technical verification
Strong analysts ask:
- Does this number make sense?
- Does this align with known business rules?
- Could this value be correct?
- What assumption am I making?
11. Practical Checklist for Data Validation
Before finalizing analysis:
- Recalculate key totals.
- Check for divide-by-zero risks.
- Verify no unexpected nulls.
- Confirm logical constraints.
- Inspect extreme values.
- Review alignment with business objectives.
12. Summary
Data validation is:
- More than a spreadsheet tool.
- An ongoing quality-control process.
- A combination of technical checks and business reasoning.
- Essential for reliable analysis.
The best analysts do not assume data is correct.
They verify, question, and confirm before moving forward.
Data validation protects both your analysis and your professional credibility.
