Concluding Thoughts - What a Journey!

Throughout this course, I’ve had the wonderful opportunity to explore the world of data science through the lens of R programming. Reflecting back on what I wrote 10 weeks ago, I can see how my understanding and knowledge of the subject matter has grown through experiential learning and assignments that have allowed me to practice its principles. Though I myself will not become a data scientist, my appreciation for their work has greatly deepened, and I hope that I will have the opportunity to readily communicate and collaborate closely with data science colleagues on impactful projects in the workplace.

Read More

Project 2 - Trial & Error While Examining Mashable Data

This project is not my proudest moment - admittedly, I struggled with multiple elements. The most difficult components for me included the final automation of the report; I watched the Module 8 video multiple times, paid attention to activity on the forums, and did about five hours of research through StackExchange and other sources, but I’m afraid that this component entirely evaded me. My understanding is that there needs to be a separate script - perhaps a loop - in the README.md file, which utilizes the apply() function to scan through each of the columns that starts with “weekday_is_”, with each outputting to a separate github document that could then be linked to within the README. I am also under the impression that several changes to the YAML must be made, as to set the parameters; but putting these steps together has proven to be a major sticking point for me, and I’m afraid I had to generate the reports manually, filtering by columns - not the ideal solution I was hoping for. In retrospect, I should have requested help earlier on this issue. I felt foolish for not understanding the concept, but it was pure hubris on my part to think I would just “figure it out”.

Read More

Reflection Post - The Semester So Far

This summer session has been an absolute whirlwind! I have found the content to be stimulating, challenging, and a wonderful way to explore a coding language I had never engaged with previously. The pace of the homework/new content/projects/exams is daunting, though I find that this immersive approach allows me to learn things more effectively. I am incredibly grateful for Professor Post and Subhankar’s office hours, which have been absolutely crucial as I work through big and little snafus and roadblocks - being able to talk through the issue or have a concept explained again is invaluable in the learning process. The concepts are becoming decidedly harder in some respects, and I am admittedly a little nervous, but I hope to finish strong! Thank you for everything.

Read More

Project 1 - Ice Ice Baby - Exploring the NHL Franchise API Using jsonlite and ggplot2

For the first project of the course, we were tasked with querying the National Hockey League’s “Franchise” API. For this purpose, I created a series of functions that could read in and parse JSON data using package jsonlite, for five different calls. Out of these calls, three (season, goalie, and skater records) can return specific rows of data; a user can insert a franchiseID and receive the row for that particular franchise. Ideally, with a little further knowledge, I would have liked to have delved more deeply into loops or vectorization, as to be able to repeatedly query the API for all different franchise IDs, and then combine the rows into a single large dataset. I believe this would allow for further important data exploration, as one could readily compare values across teams.

Read More

Post 2 - Initial Impressions of R

Two weeks into this course, my favorite component of R has by far been the versatile and wide-ranging array of packages that have been concocted to ease usage of the program. These packages’ slick and creative approaches to bundling and calling upon data sets and functions are genuinely enjoyable to discover, and I’m impressed with how many user-created packages there are. I also appreciate the ease of use pertaining to creating and combining data types and working with different data structures, though I’m sure we’ll be unearthing a plethora of idiosyncrasies over the coming weeks.

Read More

Post 1 - What Does a Data Scientist Do?

Data science is an exciting and pivotal sector. Its practitioners are increasingly in demand in a wide variety of fields, from business to healthcare to the public sphere. A data scientist’s role is to help collect, process, elucidate, and draw conclusions from data, helping organizations and individuals solve problems and make the best decisions possible - therefore, they must be savvy across diverse disciplines. This includes being fluent in numerous coding languages (such as R, Python, SAS, etc.) and having a comprehensive understanding of statistics and analytics, while knowing how to adeptly model and visualize data, in order to come up with the best solution possible in context of the field they work in. Solutions are often derived through writing algorithms, designing A/B tests, and devising ways to ensure data accuracy. Effective communication skills are crucial, as the data scientist must often persuade others - who may not be as familiar with data science principles - of their ideas. The role is often a collaborative one, as data scientists often work in tandem with product development or other departments, and may later hand their prototyped models off to a machine learning engineer, who is responsible for adapting and scaling the models by writing production-level code.

Read More