Okay, so, yesterday I was messing around, trying to get some data on the Real Madrid vs. Atletico Madrid game. It’s a big rivalry, right? Figured it would be a fun little project.

First thing I did was hit up a few sports data sites. I was hoping to find some historical data, like, going back a few years. Found some okay stuff, but it was scattered all over the place. Had to do some serious digging, you know?
Data Scraping, Sort Of
Alright, so I ended up doing a bit of “scraping” – I put that in quotes because it wasn’t a fancy script or anything. Just copying and pasting from a few websites into a spreadsheet. Real old-school, I know! I grabbed stuff like:
- The date of each match
- The final score
- Who scored the goals
- Yellow and red cards (because those games get heated!)
Cleaning Up the Mess
Man, that raw data was a disaster. Dates were in different formats, team names were inconsistent (“Real Madrid CF” vs. “Real Madrid” – come on!), and the goalscorers were just a jumbled mess of names. Spent a good chunk of time cleaning it all up in Google Sheets. Lots of `=CLEAN()`, `=TRIM()`, and `=SUBSTITUTE()` functions were used. My eyes were starting to hurt.
Time for Some Numbers
Once the data was somewhat clean, I started playing around with it. I calculated things like:
- Total goals scored in the rivalry (all time)
- Win percentage for each team
- Average goals per game
- The most common scoreline
A Few Cool Findings

Here’s some of what I found:
- Real Madrid has historically won more games. No surprise there.
- The most frequent score is 2-1, with Real Madrid usually winning.
- The number of red cards is higher than I thought. Those games are seriously intense!
Visualization (Simple Stuff)
I made a couple of basic charts in Google Sheets to visualize the win percentages and goal averages. Nothing fancy, just some bar graphs to make the data a bit easier to understand. Wish I could use Tableau.
Lessons Learned
This was a fun little project, and it was cool to see the data behind the rivalry. A couple of things I learned:
- Data cleaning is ALWAYS the most time-consuming part. Seriously, like 80% of the effort.
- Even with simple data, you can find some interesting insights.
- Next time, I’m going to try and automate the data scraping process with Python. Copying and pasting is for the birds!
Anyway, that’s how I spent my yesterday. Not exactly groundbreaking stuff, but it was a good way to kill some time and learn a bit about data analysis. Maybe I’ll try a more complex project next time. Thinking about diving into some player stats. We will see!