Beginner Python tutorial

Learn pandas by asking one good question of a CSV.

This walkthrough shows the beginner loop: load a dataset, inspect it, group it, plot it, and write one plain-English takeaway. You can run the same steps in the browser lab without installing Python first.

Open the data picker Jump to the code

What you will practice

This tutorial is for the first week of pandas: not model training, not dashboards, just enough data work to answer a real question.

1. Start with a question

A good beginner data question is narrow enough that you can answer it with one table or one chart.

Example: In a movie ratings CSV, which genre has the highest average rating?

That question gives you a clear path: find the genre column, find the rating column, group by genre, calculate the average, then sort.

2. Inspect before analyzing

Most beginner pandas errors come from assuming column names or data types. Inspect the file first.

import pandas as pd

df = pd.read_csv("movies.csv")

print(df.shape)
print(df.columns)
print(df.dtypes)
df.head()

Look for the exact spelling of the columns you need. Rating, rating, and imdb_rating are different names to pandas.

3. Keep only the rows you can trust

If the rating column has missing values or text mixed into it, clean that before grouping.

df["rating"] = pd.to_numeric(df["rating"], errors="coerce")
clean = df.dropna(subset=["genre", "rating"])

errors="coerce" turns values pandas cannot parse into missing values. Then dropna removes rows that cannot answer the question.

4. Group, sort, and plot

genre_rating = (
    clean
    .groupby("genre", as_index=False)["rating"]
    .mean()
    .sort_values("rating", ascending=False)
)

genre_rating.head(10).plot(
    kind="bar",
    x="genre",
    y="rating",
    title="Average movie rating by genre"
)

Read the chain from top to bottom: choose a grouping column, calculate the average rating for each group, sort the rows, then plot the top results.

5. Write one careful takeaway

A chart is not finished until you can say what it means and what it does not mean.

Example takeaway: In this dataset, documentaries have the highest average rating. This does not prove documentaries are always better; it may reflect which movies were included in the CSV.

Try it in the lab

Open the browser lab, load a small CSV, and ask a question like: "Which category has the highest average value?" Then inspect the generated pandas code and edit one line yourself.