Extra Exercises
This 45-minute collaborative activity is designed for students to work in pairs to solve the following problem.
- Jupyter Notebook Development (45 minutes) Students will develop a Jupyter Notebook to solve a more complex problem. The task involves writing and executing code, analyzing the results, and documenting their approach. The completed notebook must be submitted at the end of the session. This activity encourages teamwork, critical thinking, and hands-on coding experience.
The file households.xlsx contains data from a survey of 500 randomly selected households. The list of variables is the following:
Variable name | Description |
Family | How many members are there in the household? |
Location | Location of residence within the given city: 1=SW sector, 2=NW sector, 3=NE sector, and 4=SE sector. |
Ownership | 0= rent home (house or apartment) or 1= own home. |
First Income | Annual income of first household wage earner. |
Second Income | Annual income of second household wage earner (if applicable). |
Monthly Payment | Monthly home mortgage payment or rent payment. |
Utilities | Average monthly expenditure on utilities. |
Debt | Total indebtedness (excluding home mortgage) in dollars. |
- Indicate the type of data for each of the variables included in the survey.
- For each of the categorical variables in the survey, indicate whether the variable is nominal or ordinal, and why.
- Create a bar plot for location variable with the frequency in the y-axis.
- Create a pie chart with the relative frequencies of variable Ownership.
- Create a histogram for each of the numeric variables in this data set. Indicate whether each of these distributions is approximately symmetric or skewed (to the right or to the left)?
- Find the maximum and minimum debt levels for the households in this sample.
- Find the indebtedness levels at each of the 25th, 50th, and 75th percentiles.
- Find and interpret the interquartile range for the indebtedness levels of these households.
- For the nominal variables in the dataset, replace each of them with dummy variables.
- The variable “Second Income” as several missing values, the reason is because there is no second income in that household. Therefore, instead of NA it should be 0. Replace all missing values in that column by 0.
- Using the MinMaxScaler, normalize the numerical variables.
Attempts allowed: 1
This quiz closed on Friday, 21 March 2025, 11:00 PM