Project 1 - Ice Ice Baby - Exploring the NHL Franchise API Using jsonlite and ggplot2

For the first project of the course, we were tasked with querying the National Hockey League’s “Franchise” API. For this purpose, I created a series of functions that could read in and parse JSON data using package jsonlite, for five different calls. Out of these calls, three (season, goalie, and skater records) can return specific rows of data; a user can insert a franchiseID and receive the row for that particular franchise. Ideally, with a little further knowledge, I would have liked to have delved more deeply into loops or vectorization, as to be able to repeatedly query the API for all different franchise IDs, and then combine the rows into a single large dataset. I believe this would allow for further important data exploration, as one could readily compare values across teams.

Initially, when looking at the resulting data sets, I was admittedly somewhat concerned. The data contained columns with values that represented years of these franchises’ performance - how could I readily compare these values, when so many of the columns’ values varied so vastly? To address this issue, I began examining possible relationships between the data, and finding ways to create a “standardized” measure for certain values. To point, a team that has been playing for 50 years - no matter how poorly they perform - will almost inevitably have more wins than a team that has been playing for only three years, but that doesn’t mean that the older team is necessarily the better one. To avoid a “comparing apples and oranges” scenario, many of my plots and other calculations centered around new variables, crafted specifically to standardize the measures through ratios and answer questions that I devised as I pondered more about the data. Questions that I addressed included whether teams are more likely to win at home or away; whether more penalty-prone teams win more; and how weak goalies affect a team’s performance. This was the most difficult piece of the project for me - figuring out what data could be useful, while determining what new variables would be meaningful for analysis.

To track my changes, I connected my R project with my Github account, thereby enabling the power of version control. I periodically committed changes; truthfully, I wish I had pushed these at more regular, strategic intervals as to correlate them with each step of the process, as opposed to the more sporadic approach I inadvertently took. Further intentionality in this regard would have made my commits more legible and meaningful for outside collaborators.

I walked away from this project feeling inspired to discover more. I’ve had the pleasure of attend a couple Canes games and a handful of minor-league matches back in Virginia, but I’m afraid my knowledge of hockey is somewhat limited; this project allowed me to learn new terminology while examining components of what makes a team successful. On a broader scale, I started realizing the infinite possibilities presented by a wide range of APIs that are publically accessible and ready to be examined, both of the profoundly impactful and simply amusing varieties. (For the latter: Breweries! AccuWeather! Chuck Norris facts!) Surely so many answers must lie within this data. What creative solutions can we devise to the increasingly complex problems our society is facing? What new mysteries will be unraveled, while perhaps having some fun along the way? The tools that R offers, along with the teams of dedicated data scientists managing these incredible repositories, make it seem that the sky is truly the limit.

To view my project, make sure to check out the Github Pages site at https://cmheubus.github.io/Project-1/. I look forward to hearing your thoughts!

Written on June 12, 2020