Project 2 - Trial & Error while Examining Mashable Data
July 3, 2020
Links to Reports & Conclusions
Monday
I calculated the mean of the predictions for both the linear and ensemble models. The rfNewsDataPred random forest (ensemble) model estimated that the mean number of shares for an article published on a Monday would be approximately 4,082; the fit16NewPred multiple linear regression model estimated that the mean would be approximately 3,514. The RMSE for the linear model (23147.86) is lower than that of the random forest model (23204.43).
The analysis for Monday is available here.
Tuesday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Tuesday would be 3,528; the fit16NewPred model predicted that the mean number of shares would be approximately 3,120. The RMSE for the linear model (13094.88) is lower than that of the random forest model (13324.2).
The analysis for Tuesday is available here.
Wednesday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Wednesday would be approximately 3,843; the fit16NewPred model predicted that the mean number of shares would be 3,336. The RMSE for the linear model (8215.138) is lower than that of the random forest model (8567.245).
The analysis for Wednesday is available here.
Thursday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Thursday would be approximately 3,462; the fit16NewPred model predicted that the mean number of shares would be 3,126. The RMSE for the linear model (10220.05) is lower than that of the random forest model (10236.14).
The analysis for Thursday is available here.
Friday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Friday would be approximately 3,462; the fit16NewPred model predicted that the mean number of shares would be 3,126. The RMSE for the linear model (10220.05) is lower than that of the random forest model (10236.14).
The analysis for Friday is available here.
Saturday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Saturday would be 3,747; the fit16NewPred model predicted that the mean number of shares would be approximately 3,268. The RMSE for the linear model (8874.317) is lower than that of the random forest model (8949.476).
The analysis for Saturday is available here.
Sunday
The rfNewsDataPred model predicted that the mean number of shares for an article published on a Sunday would be 4,020; the fit16NewPred model predicted that the mean number of shares would be approximately 3767. The RMSE for the linear model (6056.581) is lower than that of the random forest model (6119.516).
The analysis for Sunday is available here.
Which Model is More Effective?
Looking at the Root Mean Square Errors across the days, we can see that the linear model (rfNewsDataPred) consistently exhibits lower RMSE than the random forest model. Therefore, I would recommend using the linear model over the ensemble model.