Modeling Roadmap (including Baseline Model)

The modelling roadmap serves as an agreement on how the modelling will be developed. It should detail and justify all technical aspects related to modelling, such as: 

  • What is the model goal 
  • What is the target variable (for supervised machine learning models) 
  • How the modelling dataset will be built (i.e., what does a line in the dataset represent) 
  • What variable groups will be used to train the model 
  • What will be the modelling baseline 
  • What algorithms will be applied, and the minimum set of hyperparameters to be tested, for each algorithm
  • Which evaluation metric will be used to select the model 
  • How will the model be validated (e.g., train-test split, cross-validation...) 
  • Which additional metrics will be used to evaluate the models, if any.

To say that a machine learning model is good or bad, we need to compare it with the existing practice. The baseline model represents the ”business-as-usual” of the task. If there was no machine learning model implemented, how would the task be performed? With what performance? It is important to establish a baseline in order to understand how much your model would improve the performance of the task. 

A baseline model adapted to the problem in hand should be defined, justified, and implemented, evaluating it using the selected metric for model performance.