1. Pandas Core Classes
Pandas has two main data structures:
1. DataFrame
- 2D structure (rows + columns)
- Like a spreadsheet or SQL table
2. Series
- 1D structure
- Represents a single column or row
Key Point:
DataFrame = table, Series = single column/row
2. What is a DataFrame?
A DataFrame:
- Has labeled rows and columns
- Can store multiple data types
- Used for:
- Data manipulation
- Data analysis
Key Point:
Main structure for working with data
3. Creating a DataFrame
From Dictionary
pd.DataFrame({
"Name": ["Alice", "Bob"],
"Age": [25, 30]
})- Keys → column names
- Values → column data
From NumPy Array
pd.DataFrame(array, columns=["A", "B"], index=[0,1])Key Point:
Flexible creation from different data sources
4. Loading Data (CSV)
Use:
pd.read_csv("file.csv")- Reads CSV into DataFrame
- Can load from:
- URL
- Local file
Key Point:
Most common way to import data
5. What is a Series?
A Series:
- 1D labeled array
- Represents:
- A column
- A row
Example:
df["Age"]→ Series
Key Point:
Series = building block of DataFrame
6. Attributes vs Methods
Attributes (no parentheses)
.columns→ column names.shape→ (rows, columns)
Methods (use parentheses)
.info()→ dataset info
Key Point:
Attribute = property, Method = action
7. Key DataFrame Attributes
Columns
df.columnsdf.columns
- Returns column names
Shape
df.shape- Returns:
- (number of rows, number of columns)
Info
df.info()- Shows:
- Data types
- Missing values
- Memory usage
Key Point:
Use these for quick inspection
8. Null Values (NaN)
- Missing data → NaN (Not a Number)
Important for:
- Data cleaning
- Analysis
Key Point:
NaN represents missing values
9. Selecting Columns
Bracket notation (recommended):
df["Age"]Dot notation:
df.AgeLimitation:
- Dot notation fails if column has spaces
Key Point:
Prefer bracket notation
10. Selecting Multiple Columns
df[["Age", "Fare"]]Returns:
- New DataFrame
Key Point:
Use list inside brackets
11. Selecting Rows with iloc
iloc = index-based selection
Single row:
df.iloc[0]Multiple rows:
df.iloc[0:3]Key Point:
Uses integer positions
12. Selecting Rows & Columns with iloc
df.iloc[0:3, 2:4]- Rows 0–2
- Columns 2–3
Key Point:
Select subset of data
13. Accessing Single Value
df.iloc[0, 3]Returns:
- Single value
Key Point:
Use two indices
14. Selecting with loc
loc = label-based selection
df.loc[1:3, "Name"]- Uses row/column names
Key Point:
Select using labels instead of index
15. Adding a New Column
df["NewColumn"] = valuesAdds column to DataFrame
Key Point:
Easy data expansion
16. Data Type Note
- Mixed/string columns → type =
"object"
Because:
- Pandas built on NumPy
Key Point:
Object = generic data type
17. Practical Workflow
Common steps:
- Load data
- Inspect structure
- Select data
- Analyze
- Modify
Key Point:
Pandas simplifies full workflow
18. Why Pandas is Powerful
- Easy to read
- Handles large data
- Combines many operations
Key Point:
High productivity tool
19. Debugging Tip
When errors occur, check:
.shape.columns.info()
Key Point:
Understand data before fixing errors
20. Learning Strategy
- Practice selecting data
- Experiment with slicing
- Read documentation
Key Point:
Hands-on practice is essential
Final Summary
Pandas provides two core data structures: DataFrame and Series. A DataFrame is a two-dimensional table used for storing and analyzing data, while a Series represents a single column or row. Using methods like read_csv(), .iloc, .loc, and attributes such as .shape and .columns, data professionals can efficiently load, inspect, manipulate, and analyze datasets. Pandas simplifies complex data workflows and is one of the most essential tools in data science.
Key Takeaways
- DataFrame = table, Series = column/row
- Use
pd.read_csv()to load data .columns,.shape,.info()for inspection- Use
[]for column selection - Use
iloc(index) andloc(label) - Add columns with assignment
- NaN = missing data
- Pandas built on NumPy
- Core tool for data analysis
