Pipeline
A code repository that is well-organized, clean (e.g., all branches merged) and well-documented (i.e., has a readme, has instructions to run
the code, functions are described…) should be delivered. The repository should have no data or passwords committed to it.
The pipeline should allow the user to automatically train one or more models, starting from the raw data. Then, it should allow the user to
visualize the results of the trained model(s). The more automated and flexible the pipeline is (e.g., allowing the user to select and run
multiple algorithms with different hyperparameters, and then to compare their results; allowing the user to pick different train and test
sets, or different sets of features, etc.,), the better.
Your project’s pipeline should include scripts that perform the following tasks:
- Data collection
- Data cleaning
- Data transformation for modeling (including ways to deal with invalid and null values, if any)
- Data splits for model validation
- Model training
- Performance Evaluation of models
- Visualization of model results
Output:
Github repository