What you will practice
This tutorial is for the first week of pandas: not model training, not dashboards, just enough data work to answer a real question.
- Load a CSV with
pd.read_csv. - Check rows, columns, and data types before guessing.
- Use
groupbyto summarize a category. - Sort the result and make a simple bar chart.
- Explain what the chart does and does not prove.
Example notebooks
Open one of these editable notebooks first if you want to see the full beginner loop before trying your own CSV.
groupby.
1. Start with a question
A good beginner data question is narrow enough that you can answer it with one table or one chart.
Example: In a movie ratings CSV, which genre has the highest average rating?
That question gives you a clear path: find the genre column, find the rating column, group by genre, calculate the average, then sort.
2. Inspect before analyzing
Most beginner pandas errors come from assuming column names or data types. Inspect the file first.
import pandas as pd
df = pd.read_csv("movies.csv")
print(df.shape)
print(df.columns)
print(df.dtypes)
df.head()
Look for the exact spelling of the columns you need. Rating, rating, and imdb_rating are different names to pandas.
3. Keep only the rows you can trust
If the rating column has missing values or text mixed into it, clean that before grouping.
df["rating"] = pd.to_numeric(df["rating"], errors="coerce")
clean = df.dropna(subset=["genre", "rating"])
errors="coerce" turns values pandas cannot parse into missing values. Then dropna removes rows that cannot answer the question.
4. Group, sort, and plot
genre_rating = (
clean
.groupby("genre", as_index=False)["rating"]
.mean()
.sort_values("rating", ascending=False)
)
genre_rating.head(10).plot(
kind="bar",
x="genre",
y="rating",
title="Average movie rating by genre"
)
Read the chain from top to bottom: choose a grouping column, calculate the average rating for each group, sort the rows, then plot the top results.
5. Write one careful takeaway
A chart is not finished until you can say what it means and what it does not mean.
Example takeaway: In this dataset, documentaries have the highest average rating. This does not prove documentaries are always better; it may reflect which movies were included in the CSV.
Try it in the lab
Open the browser lab, load a small CSV, and ask a question like: "Which category has the highest average value?" Then inspect the generated pandas code and edit one line yourself.