2610-2425_S2: Pipeline

Pipeline

A code repository that is well-organized, clean (e.g., all branches merged) and well-documented (i.e., has a readme, has instructions to run the code, functions are described…) should be delivered. The repository should have no data or passwords committed to it. The pipeline should allow the user to automatically train one or more models, starting from the raw data. Then, it should allow the user to visualize the results of the trained model(s). The more automated and flexible the pipeline is (e.g., allowing the user to select and run multiple algorithms with different hyperparameters, and then to compare their results; allowing the user to pick different train and test sets, or different sets of features, etc.,), the better.

Your project’s pipeline should include scripts that perform the following tasks:

Data collection
Data cleaning
Data transformation for modeling (including ways to deal with invalid and null values, if any)
Data splits for model validation
Model training
Performance Evaluation of models
Visualization of model results

Output:

Github repository