1) Manual vs. Tool-Based Data Cleaning

Data can be cleaned manually (for example, fixing misspellings or deleting duplicates by hand), but this approach is slow and error-prone for large datasets.
Spreadsheet applications provide built-in tools that make data cleaning faster, more consistent, and more reliable.

Common efficiency tools include:

  • Conditional formatting
  • Removing duplicates
  • Standardizing formats (dates, numbers)
  • Fixing text strings and substrings
  • Splitting text into columns

2) Conditional Formatting

Conditional formatting changes the appearance of cells when they meet specific rules.

Why it is useful:

  • Highlights problems visually
  • Makes issues easy to spot in large datasets
  • Helps identify data that violates expected conditions

Common cleaning use cases:

  • Highlighting blank cells to find missing data
  • Flagging values outside an expected range
  • Identifying unusual or inconsistent entries

Conditional formatting does not change the data itself—it helps analysts detect issues quickly.


3) Removing Duplicates

Duplicate records can distort totals, counts, and summaries.

Best practice:

  • Always make a copy of the dataset before removing duplicates.

The Remove Duplicates tool:

  • Automatically identifies identical rows
  • Deletes repeated entries
  • Can be applied to selected columns or the entire dataset
  • Requires confirming whether the dataset includes a header row

This tool is essential for preventing inflated values and incorrect conclusions.


4) Standardizing Formats

Inconsistent formatting can make analysis confusing or incorrect.

Common formatting issues:

  • Dates stored in multiple formats
  • Numbers displayed as percentages or text
  • Mixed currency or numeric styles

Standardizing formats:

  • Ensures consistent interpretation
  • Makes sorting, filtering, and calculations reliable
  • Improves readability and usability of the dataset

5) Text Strings and Substrings

  • A text string is a group of characters stored in a cell.
  • The length of a text string is the number of characters it contains.
  • A substring is a smaller portion of a text string.

Understanding text strings is important for cleaning and restructuring textual data.


6) Split Text to Columns

Split text to columns divides a text string into multiple cells based on a delimiter.

Common use cases:

  • Separating first and last names
  • Breaking addresses into city, state, and ZIP code
  • Splitting lists (e.g., certifications separated by commas)

Key concepts:

  • Delimiter: the character that separates values (comma, space, dash, etc.)
  • The delimiter can be detected automatically or specified manually

This tool helps convert multi-value cells into structured, column-based data.


7) Fixing Numbers Stored as Text

Sometimes numeric values are incorrectly stored as text due to:

  • Copying and pasting from other sources
  • Incorrect formatting
  • Import errors

Why this is a problem:

  • Calculations fail
  • Formulas return errors
  • Numeric operations cannot be performed

Using Split text to columns can force the spreadsheet to reinterpret text values as numbers, resolving calculation errors.


8) Joining Text with CONCATENATE

CONCATENATE (or equivalent functions) does the opposite of splitting:

  • Joins multiple text strings into one
  • Useful for combining first and last names, IDs, or labels

This function helps restructure data when consolidation is required.


9) Importance of Spreadsheet Tools in Data Analytics

Spreadsheet tools:

  • Save time and effort
  • Reduce manual errors
  • Improve data consistency
  • Make cleaning scalable and repeatable

They are a core part of a data analyst’s toolkit and are used daily to maintain data quality.


10) Key Takeaways

  • Spreadsheet tools significantly improve data-cleaning efficiency.
  • Conditional formatting helps identify problems visually.
  • Removing duplicates prevents distorted analysis.
  • Standardized formats ensure consistent interpretation.
  • Splitting and joining text helps restructure messy data.
  • Fixing numbers stored as text is critical for accurate calculations.
  • Effective use of spreadsheet tools leads to cleaner, more reliable data.