View on GitHub

Project-2

Project 2 - Trial & Error while Examining Mashable Data

July 3, 2020

Links to Reports & Conclusions

Monday

I calculated the mean of the predictions for both the linear and ensemble models. The rfNewsDataPred random forest (ensemble) model estimated that the mean number of shares for an article published on a Monday would be approximately 4,082; the fit16NewPred multiple linear regression model estimated that the mean would be approximately 3,514. The RMSE for the linear model (23147.86) is lower than that of the random forest model (23204.43).

The analysis for Monday is available here.

Tuesday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Tuesday would be 3,528; the fit16NewPred model predicted that the mean number of shares would be approximately 3,120. The RMSE for the linear model (13094.88) is lower than that of the random forest model (13324.2).

The analysis for Tuesday is available here.

Wednesday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Wednesday would be approximately 3,843; the fit16NewPred model predicted that the mean number of shares would be 3,336. The RMSE for the linear model (8215.138) is lower than that of the random forest model (8567.245).

The analysis for Wednesday is available here.

Thursday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Thursday would be approximately 3,462; the fit16NewPred model predicted that the mean number of shares would be 3,126. The RMSE for the linear model (10220.05) is lower than that of the random forest model (10236.14).

The analysis for Thursday is available here.

Friday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Friday would be approximately 3,462; the fit16NewPred model predicted that the mean number of shares would be 3,126. The RMSE for the linear model (10220.05) is lower than that of the random forest model (10236.14).

The analysis for Friday is available here.

Saturday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Saturday would be 3,747; the fit16NewPred model predicted that the mean number of shares would be approximately 3,268. The RMSE for the linear model (8874.317) is lower than that of the random forest model (8949.476).

The analysis for Saturday is available here.

Sunday

The rfNewsDataPred model predicted that the mean number of shares for an article published on a Sunday would be 4,020; the fit16NewPred model predicted that the mean number of shares would be approximately 3767. The RMSE for the linear model (6056.581) is lower than that of the random forest model (6119.516).

The analysis for Sunday is available here.

Which Model is More Effective?

Looking at the Root Mean Square Errors across the days, we can see that the linear model (rfNewsDataPred) consistently exhibits lower RMSE than the random forest model. Therefore, I would recommend using the linear model over the ensemble model.