1. Pandas Core Classes

Pandas has two main data structures:

1. DataFrame

  • 2D structure (rows + columns)
  • Like a spreadsheet or SQL table

2. Series

  • 1D structure
  • Represents a single column or row

Key Point:
DataFrame = table, Series = single column/row


2. What is a DataFrame?

A DataFrame:

  • Has labeled rows and columns
  • Can store multiple data types
  • Used for:
    • Data manipulation
    • Data analysis

Key Point:
Main structure for working with data


3. Creating a DataFrame

From Dictionary

pd.DataFrame({
    "Name": ["Alice", "Bob"],
    "Age": [25, 30]
})
  • Keys → column names
  • Values → column data

From NumPy Array

pd.DataFrame(array, columns=["A", "B"], index=[0,1])

Key Point:
Flexible creation from different data sources


4. Loading Data (CSV)

Use:

pd.read_csv("file.csv")
  • Reads CSV into DataFrame
  • Can load from:
    • URL
    • Local file

Key Point:
Most common way to import data


5. What is a Series?

A Series:

  • 1D labeled array
  • Represents:
    • A column
    • A row

Example:

  • df["Age"] → Series

Key Point:
Series = building block of DataFrame


6. Attributes vs Methods

Attributes (no parentheses)

  • .columns → column names
  • .shape → (rows, columns)

Methods (use parentheses)

  • .info() → dataset info

Key Point:
Attribute = property, Method = action


7. Key DataFrame Attributes

Columns

df.columns
df.columns
  • Returns column names

Shape

df.shape
  • Returns:
    • (number of rows, number of columns)

Info

df.info()
  • Shows:
    • Data types
    • Missing values
    • Memory usage

Key Point:
Use these for quick inspection


8. Null Values (NaN)

  • Missing data → NaN (Not a Number)

Important for:

  • Data cleaning
  • Analysis

Key Point:
NaN represents missing values


9. Selecting Columns

Bracket notation (recommended):

df["Age"]

Dot notation:

df.Age

Limitation:

  • Dot notation fails if column has spaces

Key Point:
Prefer bracket notation


10. Selecting Multiple Columns

df[["Age", "Fare"]]

Returns:

  • New DataFrame

Key Point:
Use list inside brackets


11. Selecting Rows with iloc

iloc = index-based selection

Single row:

df.iloc[0]

Multiple rows:

df.iloc[0:3]

Key Point:
Uses integer positions


12. Selecting Rows & Columns with iloc

df.iloc[0:3, 2:4]
  • Rows 0–2
  • Columns 2–3

Key Point:
Select subset of data


13. Accessing Single Value

df.iloc[0, 3]

Returns:

  • Single value

Key Point:
Use two indices


14. Selecting with loc

loc = label-based selection

df.loc[1:3, "Name"]
  • Uses row/column names

Key Point:
Select using labels instead of index


15. Adding a New Column

df["NewColumn"] = values

Adds column to DataFrame

Key Point:
Easy data expansion


16. Data Type Note

  • Mixed/string columns → type = "object"

Because:

  • Pandas built on NumPy

Key Point:
Object = generic data type


17. Practical Workflow

Common steps:

  1. Load data
  2. Inspect structure
  3. Select data
  4. Analyze
  5. Modify

Key Point:
Pandas simplifies full workflow


18. Why Pandas is Powerful

  • Easy to read
  • Handles large data
  • Combines many operations

Key Point:
High productivity tool


19. Debugging Tip

When errors occur, check:

  • .shape
  • .columns
  • .info()

Key Point:
Understand data before fixing errors


20. Learning Strategy

  • Practice selecting data
  • Experiment with slicing
  • Read documentation

Key Point:
Hands-on practice is essential


Final Summary

Pandas provides two core data structures: DataFrame and Series. A DataFrame is a two-dimensional table used for storing and analyzing data, while a Series represents a single column or row. Using methods like read_csv(), .iloc, .loc, and attributes such as .shape and .columns, data professionals can efficiently load, inspect, manipulate, and analyze datasets. Pandas simplifies complex data workflows and is one of the most essential tools in data science.


Key Takeaways

  • DataFrame = table, Series = column/row
  • Use pd.read_csv() to load data
  • .columns, .shape, .info() for inspection
  • Use [] for column selection
  • Use iloc (index) and loc (label)
  • Add columns with assignment
  • NaN = missing data
  • Pandas built on NumPy
  • Core tool for data analysis