Okay, here’s my take on sharing my experience with pulling player stats from the Yankees vs. Arizona Diamondbacks game.

Alright, so, I was messing around trying to grab some data from the Yankees vs. Diamondbacks game. I wanted to see if I could pull player stats, just for fun, really.
First thing I did was try to find a decent API. I spent like an hour just googling “sports API,” “MLB API,” “baseball stats API,” the whole shebang. Landed on a few options, some free, some paid. I ended up going with a free one that seemed okay for what I needed – didn’t wanna drop any cash just yet.
API Key Time! I signed up, got my API key, and then started messing with the API endpoints. This was the tricky part. The documentation was… well, let’s just say it wasn’t the clearest. I spent a good chunk of time trying to figure out how to actually ask for the data I wanted. I was specifically after player stats for that specific game.
I started writing some Python code – that’s my go-to language for this kind of stuff. Used the requests
library to hit the API. Here’s a simplified version of what I did:
- Imported the
requests
andjson
libraries. - Defined the API endpoint URL (after figuring it out, of course!).
- Added my API key to the headers.
- Made the GET request.
The first few tries? Total garbage. Kept getting errors. Turns out I was formatting the URL wrong. Had to dig around more in the documentation (and some Stack Overflow threads, let’s be honest) to figure out the correct way to specify the game ID or date or whatever the API needed to pinpoint the specific Yankees vs. Diamondbacks match.
Once I finally got a 200 OK response, I was stoked! But then… JSON soup. The data came back in this massive, nested JSON object. It was a mess to navigate. I spent a while using to turn the response into a Python dictionary, and then started picking through it. Looping through lists, accessing nested dictionaries, all that jazz.
Data Extraction: The Grind This was the most tedious part. I wanted specific stats: batting average, RBIs, home runs, etc. Finding where those were buried in the JSON was a pain. The API had some weird naming conventions, like abbreviating everything in ways that weren’t obvious. Plus, some stats were missing for certain players, which threw off my code and caused errors. Had to add a bunch of try...except
blocks to handle those cases.
Eventually, I managed to extract the stats I wanted and put them into a more manageable format – a list of dictionaries. Each dictionary represented a player, and contained their name and relevant stats.

Finally, I printed the data to the console just to see if it looked right. It did! Mostly. Some numbers were off – probably due to errors in my parsing or the API itself. I double-checked the raw JSON and tweaked my code until things looked reasonable.
What I Learned This little project reminded me how much of data science is just cleaning and wrangling data. The API documentation was a mess, the data was inconsistently formatted, and it took way longer than I expected. But hey, I got some player stats in the end! Now I’m thinking about how to automatically update a spreadsheet with game stats or something. That’ll be a project for another day.
Next Steps? I’m thinking of exploring other sports APIs, maybe ones that offer more granular data. Also, I want to visualize this data. Maybe create some charts or graphs to compare player performance. But for now, I’m just happy I got the basic stats pulled successfully.
Wrapping Up
So yeah, that was my little adventure grabbing player stats from the Yankees vs. Diamondbacks game. It wasn’t pretty, but it worked (eventually!). Hope this gives you an idea of what it’s like to work with APIs and messy data in the real world.