1. Data Validation: Function vs Process

Earlier, we discussed data validation as a spreadsheet feature (e.g., drop-down lists, restricting input).

However, data validation is also a broader analytical process.

Spreadsheet Function

  • Controls what users can enter.
  • Prevents incorrect input.

Data Validation Process

  • Verifies data accuracy.
  • Confirms logical consistency.
  • Ensures completeness and security.
  • Aligns data with business rules.
  • Detects anomalies before analysis continues.

The function is a tool.
The process is a mindset.


2. Core Goals of Data Validation

Data validation ensures that data is:

  • Complete – No missing critical fields.
  • Accurate – Values reflect reality.
  • Consistent – Formats and logic align.
  • Secure – Protected from unintended changes.
  • Aligned with business logic – Makes sense within context.

Validation reduces the risk of analytical errors.


3. Example 1: Logical Recalculation Check

Scenario

A furniture retailer dataset includes:

  • Quantity sold
  • Product price
  • Purchase price

Business rule:

Purchase Price = Quantity × Product Price

Validation Step

Add a recalculation column:

=Quantity * Product_Price

Compare this with the stored Purchase Price column.


Finding a Discrepancy

One value does not match.

Through investigation, we discover:

  • A 30% discount applies when customers buy 5 or more items.

Without validation:

  • We might incorrectly assume data corruption.
  • Or worse, base analysis on incorrect assumptions.

Validation uncovered embedded business logic.


4. Example 2: Common-Sense Business Validation

Scenario

Analyzing promotions for a business open only on weekdays.

Validation question:

Are there sales on weekends?

If weekend sales exist:

  • Could indicate data entry errors.
  • Or special events.
  • Or exceptions to normal operations.

Validation does not always mean error correction.
It often means context clarification.


5. Example 3: SQL Validation

In the avocado dataset:

Validation query:

SELECT 
  Small_Bags + Large_Bags + XLarge_Bags AS calculated_total,
  Total_Bags
FROM avocado_data.avocado_prices;

Purpose:

  • Confirm Total_Bags column is accurate.

Result:

  • Values match.
  • Data integrity confirmed.

6. Error Detection During Validation

While calculating percentage:

Small_Bags / Total_Bags

Error:

Division by zero

Meaning:

  • Some rows have Total_Bags = 0.

Fix:

  • Add WHERE condition
  • Use SAFE_DIVIDE

Validation prevented:

  • Broken dashboards
  • Stakeholder-facing errors
  • Misleading output

7. Types of Validation Checks

7.1 Arithmetic Validation

  • Recalculate totals.
  • Confirm sums match components.

7.2 Boundary Checks

  • No negative quantities.
  • No impossible values.
  • No out-of-range dates.

7.3 Business Rule Validation

  • Discounts applied correctly.
  • Promotions active on correct dates.
  • Region-specific pricing rules.

7.4 Logical Consistency Checks

  • Weekend sales vs operating schedule.
  • Revenue consistency across fields.
  • Matching foreign keys in joins.

8. Validation as Continuous Process

Validation should occur:

  • During data ingestion
  • During cleaning
  • During transformation
  • During calculation
  • Before final presentation

It is not a one-time step.


9. Why Data Validation Is Critical

Without validation:

  • Incorrect calculations propagate.
  • Small errors become large reporting problems.
  • Stakeholder trust decreases.
  • Decisions may be flawed.

With validation:

  • Confidence increases.
  • Business logic is understood.
  • Errors are caught early.
  • Presentations remain professional.

10. Analytical Mindset

Data validation requires:

  • Curiosity
  • Business understanding
  • Questioning anomalies
  • Logical reasoning
  • Technical verification

Strong analysts ask:

  • Does this number make sense?
  • Does this align with known business rules?
  • Could this value be correct?
  • What assumption am I making?

11. Practical Checklist for Data Validation

Before finalizing analysis:

  1. Recalculate key totals.
  2. Check for divide-by-zero risks.
  3. Verify no unexpected nulls.
  4. Confirm logical constraints.
  5. Inspect extreme values.
  6. Review alignment with business objectives.

12. Summary

Data validation is:

  • More than a spreadsheet tool.
  • An ongoing quality-control process.
  • A combination of technical checks and business reasoning.
  • Essential for reliable analysis.

The best analysts do not assume data is correct.
They verify, question, and confirm before moving forward.

Data validation protects both your analysis and your professional credibility.