{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Machine Learning 2695 Coding Exam" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This part of the exam will carry 50% of the exam grade, and the theoretical part will carry the other 50%. You have in total 2 hours to complete both parts. For the theoretical part, you will complete the Moodle quiz, do not forget to submit it once you are done (multiple submissions are possible, the last one counts).\n", "\n", "You should submit this notebook through Coding Exam submission on Moodle. \n", "\n", "The notebook should contain the code, output and explanations where asked. The notebook should be rerunable. The exam is open book, and while AI Assistants can be used, no points will be given in case of blind copy pasting from an AI Assistant, if the copied code is not suitable for the task, unnecessarily complex, or if the solution is obtained using methods not covered in the course.\n", "\n", "\n", "Total points: **27**\n", "\n", "- Question 1: 10 points\n", "- Question 2: 7 points\n", "- Question 3: 10 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "\n", "**10 points**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are hired by the Financial Research department to develop a model for stock market price prediction. They provided you with a dataset stored in a file: `Question1.csv` with the following features:\n", "\n", "- `past_earnings_per_share`\n", "- `past_pe_ratio`\n", "- `eff_index`\n", "- `dividend_yield`\n", "- `past_pb_ratio`\n", "- `debt_equity_ratio`\n", "- `neg_return_equity`\n", "- `return_assets`\n", "- `past_revenue_growth`\n", "- `past_operating_margin`\n", "- `price_eur`\n", "- `price_usd`\n", "\n", "The goal is to have a model that gives the lowest mean absolute error for price prediction. You are asked to:\n", "- Train with tuning one parameter, and test one interpretable model, state two most important features for the tuned model and state how do these features impact the predicted price, using only the interpretable model.\n", "- Train with tuning two parameters, and test one ensemble model, state two most important features for the tuned model and state how do these features impact the predicted price.\n", "- Choose one of the two models based on their performance and state the selected model's performance. \n", "\n", "Perform only the necessary preprocessing steps, and do not drop rows with missing values. You need to justify your decisions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2\n", "\n", "**7 points**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are hired by the Customer Service department and they want your help in identifying common themes in customer feedback tickets. You are given a dataset `Question2.csv` with the following information:\n", "\n", "- `Rating`: Customer rating for the product \n", "- `Word_Count`: Number of words in the customer feedback.\n", "- `Complexity_Feedback`: The level of complexity in the feedback.\n", "- `Helpful_Votes`: Number of votes indicating the feedback was helpful.\n", "- `Product_Type`: The category of the product.\n", "- `Response_Time`: Time taken by customer service to respond to the feedback, measured in hours.\n", "- `Sentiment_Score`: Sentiment analysis score of the feedback.\n", "- `Feedback_Source`: The source from which the feedback was received.\n", "- `Satisfaction_Level`: The customer's level of satisfaction.\n", "- `Urgency_Issue`: The urgency level of the issue described in the feedback.\n", "\n", "\n", "Your task is to identify common groups of customer support tickets and explain what are the two key variables that distinguish between these groups. You need to justify your decisions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3\n", "\n", "**10 points**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are hired by the Marketing department to help with Sales Lead Prioritization. You are given a dataset `Question3.csv` with the following information:\n", "\n", "- `Lead_Id` Unique identification of the lead\n", "- `Annual_Revenue`: The annual revenue of the lead's company in dollars,\n", "- `Industry`: Industry specification of the lead,\n", "- `Interaction_Frequency`: The number of interactions (emails, calls, meetings) with the lead,\n", "- `Lead_Age` : The number of days since the lead was created,\n", "- `Purchase_Intent` : Estimated purchase intent,\n", "- `Website_Visits`: The number of times the lead has visited the company's website,\n", "- `Lead_Quality`: Estimated quality of the lead,\n", "- `Lead_Source`: Source of the lead,\n", "- `Number_of_Employees` : The total number of employees in the lead's company.\n", "- `Lead_Success`: Whether the lead was successfully converted or not, what we wish to predict\n", "\n", "You are asked to develop, tune and evaluate an ensemble model for identifying sales leads that lead to conversion.\n", "Perform necessary pre-processing steps, and take into consideration class distribution.\n", "The department also wants to know the percentage of successful leads that your model fails to identify, and how does your model compare to selecting 40% of random leads.\n", "Additionally, the department asked to know if they contact all those leads that your model scores with probability of conversion of at least 0.1, what percentage of those leads is expected to be successful." ] } ], "metadata": { "kernelspec": { "display_name": "ml2025", "language": "python", "name": "ml2025" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" }, "toc-showcode": false, "toc-showmarkdowntxt": false }, "nbformat": 4, "nbformat_minor": 4 }