Learn pandas with a CSV

What you will practice

This tutorial is for the first week of pandas: not model training, not dashboards, just enough data work to answer a real question.

Load a CSV with pd.read_csv.
Check rows, columns, and data types before guessing.
Use groupby to summarize a category.
Sort the result and make a simple bar chart.
Explain what the chart does and does not prove.

Example notebooks

Open one of these editable notebooks first if you want to see the full beginner loop before trying your own CSV.

Tips dataset

Which day has the highest average tip rate? Practice creating a calculated column, grouping by day, and checking group size.

Open notebook

Penguins dataset

How does body mass differ by species? Practice missing-value cleanup, grouping, rounding, and comparing categories.

Open notebook

Titanic dataset

What was the survival rate by passenger class? Practice turning a 0/1 column into a percentage with groupby.

Open notebook

1. Start with a question

A good beginner data question is narrow enough that you can answer it with one table or one chart.

Example: In a movie ratings CSV, which genre has the highest average rating?

That question gives you a clear path: find the genre column, find the rating column, group by genre, calculate the average, then sort.

2. Inspect before analyzing

Most beginner pandas errors come from assuming column names or data types. Inspect the file first.

import pandas as pd

df = pd.read_csv("movies.csv")

print(df.shape)
print(df.columns)
print(df.dtypes)
df.head()

Look for the exact spelling of the columns you need. Rating, rating, and imdb_rating are different names to pandas.

3. Keep only the rows you can trust

If the rating column has missing values or text mixed into it, clean that before grouping.

df["rating"] = pd.to_numeric(df["rating"], errors="coerce")
clean = df.dropna(subset=["genre", "rating"])

errors="coerce" turns values pandas cannot parse into missing values. Then dropna removes rows that cannot answer the question.

4. Group, sort, and plot

genre_rating = (
    clean
    .groupby("genre", as_index=False)["rating"]
    .mean()
    .sort_values("rating", ascending=False)
)

genre_rating.head(10).plot(
    kind="bar",
    x="genre",
    y="rating",
    title="Average movie rating by genre"
)

Read the chain from top to bottom: choose a grouping column, calculate the average rating for each group, sort the rows, then plot the top results.

5. Write one careful takeaway

A chart is not finished until you can say what it means and what it does not mean.

Example takeaway: In this dataset, documentaries have the highest average rating. This does not prove documentaries are always better; it may reflect which movies were included in the CSV.

Try it in the lab

Open the browser lab, load a small CSV, and ask a question like: "Which category has the highest average value?" Then inspect the generated pandas code and edit one line yourself.

Open the data picker