Modeling Report and Plan for Testing

Different combinations of parameters should be experimented to develop the best machine learning model possible, combining, for example, different algorithms, hyperparameters or sets of variables.
Two types of models should be presented:
  1. At least one simple, interpretable model, for instance, a shallow decision tree, where the model output is interpretable. The goal of this model is to verify that the pipeline is working as expected, and the modelling results make sense. 
  2. More complex models, such as ensemble models, which should have a better performance than the simple model. 
The best model should be selected and presented in more detail, including an evaluation of model bias. Is the model more accurate for specific population groups (e.g., women) than for other groups? A critical analysis of the subject on the point of view of the problem should be delivered, after going through the following workshop: DSKC workshop on Bias and Fairness on DS (https://web.microsoftstream.com/video/d8aaf74a-42fd-4ac3-af6f-70d79a05c92a). If it is not possible to do such an analysis for the model, the critical analysis should include a contextualization of the conditions that justify this conclusion (type of data, analysis and/or model outputs, for example). In case it applies, the model should be audited to see if it could disproportionally affect certain groups of the population. You may use a package to do so, such as Aequitas (http://www.datasciencepublicpolicy.org/projects/aequitas/) or develop a procedure tailored to your project. 

The presentation of the modeling results should include:
  • Description of the different algorithms, hyperparameters and features you experimented with 
  • Model evaluation visualizations, comparing different models 
  • Selection of the best model, justifying your choice 
  • Definition of the groups that may be differently affected by your model, and a comparison of the best model performance for the different groups 
  • Future work (e.g., what would you like to had developed if you had the time).
While you may develop a thoroughly checked and validated model, it may fail if it is improperly used (or not used at all) by the people to whom it was designed. You should design a plan that would allow you to test the performance of your model when deployed in the ”realworld”. Will the model be used as intended? Are the outputs of the model comprehensible for the end-users? Are users following model suggestions? Before deploying the model, you can try to seek answers to these questions, designing a pilot study to test model performance outside of the machine learning workbench.

Your report should include a description of the plan for testing, namely:
  • Who are the end-users of your model? 
  • How would you design the interface for deploying your model? 
  • Which study design would you use to evaluate the adoption of your system by its end-users? 
  • Which measures would you use to evaluate the adoption of your system by its end-users? 
  • Who would participate in your pilot study and why? 
  • How would you prepare your pilot implementation (e.g., is training needed?) 
  • How many data points/ for how long should the pilot be running?