{
"cells": [
{
"cell_type": "markdown",
"id": "42ffe538-0da7-4ab8-9bb2-efce0add67a9",
"metadata": {},
"source": [
"WESTROXBURY EXAMPLE"
]
},
{
"cell_type": "markdown",
"id": "ea268c67-637e-42e1-adee-cf6a86d5db0f",
"metadata": {},
"source": [
"The data in Westroxbury.csv includes information on a single owner-occupied homes in West Roxbury, a neighborhood in southwest Boston MA, in 2014."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cc9e94f2-60e2-4a36-a7f8-1ec000501586",
"metadata": {},
"outputs": [],
"source": [
"# Learning objectives:\n",
"# 1) Learn how to load datasets into a Pandas DataFrame and inspect the structure\n",
"# 2) Apply column renaming techniques and clean variable names to improve data manipulation using dot notation.\n",
"# 3) Understand how to convert columns to appropriate data types, such as continuous, discrete, and categorical variables.\n",
"# 4) Practice selecting subsets of data using loc[], iloc[], and pd.concat() to combine columns from different positions in a DataFrame..\n",
"# 5) Explore random sampling techniques in Pandas, including weighted sampling to oversample specific cases.\n"
]
},
{
"cell_type": "markdown",
"id": "fb30738e-82af-4198-bb15-1de9dba29005",
"metadata": {},
"source": [
"Pandas is a powerful data manipulation and analysis library for Python, widely used for working with structured data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "219f1e81-8f78-407a-ada9-ca35e0586a8a",
"metadata": {},
"outputs": [],
"source": [
"# Import the required package\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "dbf98e2b-f485-4356-a7fb-5835a3db172d",
"metadata": {},
"outputs": [],
"source": [
"# Load Data\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "effbcb7c-f38a-4cda-9d2f-98d8567aef03",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5802, 14)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the dimension of the data frame\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e87fa007-9e30-4601-adb7-ff1acf2afae1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" TOTAL VALUE | \n",
" TAX | \n",
" LOT SQFT | \n",
" YR BUILT | \n",
" GROSS AREA | \n",
" LIVING AREA | \n",
" FLOORS | \n",
" ROOMS | \n",
" BEDROOMS | \n",
" FULL BATH | \n",
" HALF BATH | \n",
" KITCHEN | \n",
" FIREPLACE | \n",
" REMODEL | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 344.2 | \n",
" 4330 | \n",
" 9965 | \n",
" 1880 | \n",
" 2436 | \n",
" 1352 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 412.6 | \n",
" 5190 | \n",
" 6590 | \n",
" 1945 | \n",
" 3108 | \n",
" 1976 | \n",
" 2.0 | \n",
" 10 | \n",
" 4 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" Recent | \n",
"
\n",
" \n",
" 2 | \n",
" 330.1 | \n",
" 4152 | \n",
" 7500 | \n",
" 1890 | \n",
" 2294 | \n",
" 1371 | \n",
" 2.0 | \n",
" 8 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 498.6 | \n",
" 6272 | \n",
" 13773 | \n",
" 1957 | \n",
" 5032 | \n",
" 2608 | \n",
" 1.0 | \n",
" 9 | \n",
" 5 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 331.5 | \n",
" 4170 | \n",
" 5000 | \n",
" 1910 | \n",
" 2370 | \n",
" 1438 | \n",
" 2.0 | \n",
" 7 | \n",
" 3 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TOTAL VALUE TAX LOT SQFT YR BUILT GROSS AREA LIVING AREA FLOORS \\\n",
"0 344.2 4330 9965 1880 2436 1352 2.0 \n",
"1 412.6 5190 6590 1945 3108 1976 2.0 \n",
"2 330.1 4152 7500 1890 2294 1371 2.0 \n",
"3 498.6 6272 13773 1957 5032 2608 1.0 \n",
"4 331.5 4170 5000 1910 2370 1438 2.0 \n",
"\n",
" ROOMS BEDROOMS FULL BATH HALF BATH KITCHEN FIREPLACE REMODEL \n",
"0 6 3 1 1 1 0 NaN \n",
"1 10 4 2 1 1 0 Recent \n",
"2 8 4 1 1 1 0 NaN \n",
"3 9 5 1 1 1 1 NaN \n",
"4 7 3 2 0 1 0 NaN "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Show the first 5 rows\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6ad2b13c-57bd-4a1a-9b7a-8dfd244ad660",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['TOTAL VALUE ', 'TAX', 'LOT SQFT ', 'YR BUILT', 'GROSS AREA ',\n",
" 'LIVING AREA', 'FLOORS ', 'ROOMS', 'BEDROOMS ', 'FULL BATH',\n",
" 'HALF BATH', 'KITCHEN', 'FIREPLACE', 'REMODEL'], dtype=object)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Identify the names of the columns\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "16641259-e3f6-427a-8739-6dddf5aae052",
"metadata": {},
"outputs": [],
"source": [
"# Rename one column\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8b61c60f-1e9a-4bb9-9d72-77f6022d92cc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" LOT SQFT | \n",
" YR BUILT | \n",
" GROSS AREA | \n",
" LIVING AREA | \n",
" FLOORS | \n",
" ROOMS | \n",
" BEDROOMS | \n",
" FULL BATH | \n",
" HALF BATH | \n",
" KITCHEN | \n",
" FIREPLACE | \n",
" REMODEL | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 344.2 | \n",
" 4330 | \n",
" 9965 | \n",
" 1880 | \n",
" 2436 | \n",
" 1352 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 412.6 | \n",
" 5190 | \n",
" 6590 | \n",
" 1945 | \n",
" 3108 | \n",
" 1976 | \n",
" 2.0 | \n",
" 10 | \n",
" 4 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" Recent | \n",
"
\n",
" \n",
" 2 | \n",
" 330.1 | \n",
" 4152 | \n",
" 7500 | \n",
" 1890 | \n",
" 2294 | \n",
" 1371 | \n",
" 2.0 | \n",
" 8 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX LOT SQFT YR BUILT GROSS AREA LIVING AREA FLOORS \\\n",
"0 344.2 4330 9965 1880 2436 1352 2.0 \n",
"1 412.6 5190 6590 1945 3108 1976 2.0 \n",
"2 330.1 4152 7500 1890 2294 1371 2.0 \n",
"\n",
" ROOMS BEDROOMS FULL BATH HALF BATH KITCHEN FIREPLACE REMODEL \n",
"0 6 3 1 1 1 0 NaN \n",
"1 10 4 2 1 1 0 Recent \n",
"2 8 4 1 1 1 0 NaN "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 9,
"id": "70296ae9-e643-4875-a459-d7298e82b2c9",
"metadata": {},
"outputs": [],
"source": [
"# Rename several columns: replace 'spaces' with '_' to allow dot notation\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "920a75b3-024d-47eb-a66d-0e736026c302",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" LOT_SQFT | \n",
" YR_BUILT | \n",
" GROSS_AREA | \n",
" LIVING_AREA | \n",
" FLOORS | \n",
" ROOMS | \n",
" BEDROOMS | \n",
" FULL_BATH | \n",
" HALF_BATH | \n",
" KITCHEN | \n",
" FIREPLACE | \n",
" REMODEL | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 344.2 | \n",
" 4330 | \n",
" 9965 | \n",
" 1880 | \n",
" 2436 | \n",
" 1352 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 412.6 | \n",
" 5190 | \n",
" 6590 | \n",
" 1945 | \n",
" 3108 | \n",
" 1976 | \n",
" 2.0 | \n",
" 10 | \n",
" 4 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" Recent | \n",
"
\n",
" \n",
" 2 | \n",
" 330.1 | \n",
" 4152 | \n",
" 7500 | \n",
" 1890 | \n",
" 2294 | \n",
" 1371 | \n",
" 2.0 | \n",
" 8 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX LOT_SQFT YR_BUILT GROSS_AREA LIVING_AREA FLOORS ROOMS \\\n",
"0 344.2 4330 9965 1880 2436 1352 2.0 6 \n",
"1 412.6 5190 6590 1945 3108 1976 2.0 10 \n",
"2 330.1 4152 7500 1890 2294 1371 2.0 8 \n",
"\n",
" BEDROOMS FULL_BATH HALF_BATH KITCHEN FIREPLACE REMODEL \n",
"0 3 1 1 1 0 NaN \n",
"1 4 2 1 1 0 Recent \n",
"2 4 1 1 1 0 NaN "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 11,
"id": "39ffd6e1-c8ab-4e9e-8512-c1cbf346bd34",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 4330\n",
"1 5190\n",
"2 4152\n",
"3 6272\n",
"Name: TAX, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Practice showing some part of the data frame:\n",
"# the first 4 rows of variable 'TAX'\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "36ed0b5b-bb2d-48b4-bf75-b92a6a4dc85e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" TAX | \n",
" LOT_SQFT | \n",
" YR_BUILT | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4330 | \n",
" 9965 | \n",
" 1880 | \n",
"
\n",
" \n",
" 1 | \n",
" 5190 | \n",
" 6590 | \n",
" 1945 | \n",
"
\n",
" \n",
" 2 | \n",
" 4152 | \n",
" 7500 | \n",
" 1890 | \n",
"
\n",
" \n",
" 3 | \n",
" 6272 | \n",
" 13773 | \n",
" 1957 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TAX LOT_SQFT YR_BUILT\n",
"0 4330 9965 1880\n",
"1 5190 6590 1945\n",
"2 4152 7500 1890\n",
"3 6272 13773 1957"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the first 4 rows of the variables 2, 3 and 4\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bd0fc4e5-f5e9-4aeb-83fe-706f98ae94c0",
"metadata": {},
"outputs": [],
"source": [
"# Use pd.concat to combine non-consecutive columns into a new data frame\n",
"# The axis argument specifies the dimension along which the concatenation happens: 0 - rows; 1 - columns "
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b485cf6f-707f-404c-b1c8-6c58df18bd8f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" GROSS_AREA | \n",
" LIVING_AREA | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" 331.5 | \n",
" 4170 | \n",
" 2370 | \n",
" 1438 | \n",
"
\n",
" \n",
" 5 | \n",
" 337.4 | \n",
" 4244 | \n",
" 2124 | \n",
" 1060 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX GROSS_AREA LIVING_AREA\n",
"4 331.5 4170 2370 1438\n",
"5 337.4 4244 2124 1060"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e21fd59a-bed4-40d4-b47c-963714a408d4",
"metadata": {},
"outputs": [],
"source": [
"# To specify a data frame that is a subset of the initial data frame, use:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9aadeb1a-c7b0-4f86-b43e-7ef4a33aea1a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" GROSS_AREA | \n",
" LIVING_AREA | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 344.2 | \n",
" 4330 | \n",
" 2436 | \n",
" 1352 | \n",
"
\n",
" \n",
" 1 | \n",
" 412.6 | \n",
" 5190 | \n",
" 3108 | \n",
" 1976 | \n",
"
\n",
" \n",
" 2 | \n",
" 330.1 | \n",
" 4152 | \n",
" 2294 | \n",
" 1371 | \n",
"
\n",
" \n",
" 3 | \n",
" 498.6 | \n",
" 6272 | \n",
" 5032 | \n",
" 2608 | \n",
"
\n",
" \n",
" 4 | \n",
" 331.5 | \n",
" 4170 | \n",
" 2370 | \n",
" 1438 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX GROSS_AREA LIVING_AREA\n",
"0 344.2 4330 2436 1352\n",
"1 412.6 5190 3108 1976\n",
"2 330.1 4152 2294 1371\n",
"3 498.6 6272 5032 2608\n",
"4 331.5 4170 2370 1438"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 17,
"id": "26ca6c71-1b2f-462c-bfdd-2545ae5fe8b1",
"metadata": {},
"outputs": [],
"source": [
"# Sampling from a data frame\n",
"# Obtaining a random sample of 5 observations from the data set"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "c110910f-15b7-48a0-a2db-c0b86735aabf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" LOT_SQFT | \n",
" YR_BUILT | \n",
" GROSS_AREA | \n",
" LIVING_AREA | \n",
" FLOORS | \n",
" ROOMS | \n",
" BEDROOMS | \n",
" FULL_BATH | \n",
" HALF_BATH | \n",
" KITCHEN | \n",
" FIREPLACE | \n",
" REMODEL | \n",
"
\n",
" \n",
" \n",
" \n",
" 888 | \n",
" 284.9 | \n",
" 3584 | \n",
" 4736 | \n",
" 1966 | \n",
" 2032 | \n",
" 1360 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
" 5380 | \n",
" 434.1 | \n",
" 5460 | \n",
" 5000 | \n",
" 1930 | \n",
" 2597 | \n",
" 1521 | \n",
" 2.0 | \n",
" 7 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
" 1088 | \n",
" 322.3 | \n",
" 4054 | \n",
" 3402 | \n",
" 1935 | \n",
" 2222 | \n",
" 1400 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" Old | \n",
"
\n",
" \n",
" 168 | \n",
" 242.9 | \n",
" 3055 | \n",
" 4151 | \n",
" 1920 | \n",
" 2068 | \n",
" 1248 | \n",
" 2.0 | \n",
" 7 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
"
\n",
" \n",
" 959 | \n",
" 269.8 | \n",
" 3394 | \n",
" 3438 | \n",
" 1930 | \n",
" 2516 | \n",
" 1238 | \n",
" 1.5 | \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" Old | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX LOT_SQFT YR_BUILT GROSS_AREA LIVING_AREA FLOORS \\\n",
"888 284.9 3584 4736 1966 2032 1360 2.0 \n",
"5380 434.1 5460 5000 1930 2597 1521 2.0 \n",
"1088 322.3 4054 3402 1935 2222 1400 2.0 \n",
"168 242.9 3055 4151 1920 2068 1248 2.0 \n",
"959 269.8 3394 3438 1930 2516 1238 1.5 \n",
"\n",
" ROOMS BEDROOMS FULL_BATH HALF_BATH KITCHEN FIREPLACE REMODEL \n",
"888 6 3 1 1 1 1 NaN \n",
"5380 7 3 1 1 1 1 NaN \n",
"1088 6 3 1 1 1 2 Old \n",
"168 7 3 1 0 1 0 NaN \n",
"959 6 3 1 1 1 0 Old "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 19,
"id": "008e4294-a992-4f06-b7df-0f08e404c236",
"metadata": {},
"outputs": [],
"source": [
"# Oversample houses with over 10 rooms "
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c5fcdb08-82c1-4548-9ef5-ee171939f715",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" T_VALUE | \n",
" TAX | \n",
" LOT_SQFT | \n",
" YR_BUILT | \n",
" GROSS_AREA | \n",
" LIVING_AREA | \n",
" FLOORS | \n",
" ROOMS | \n",
" BEDROOMS | \n",
" FULL_BATH | \n",
" HALF_BATH | \n",
" KITCHEN | \n",
" FIREPLACE | \n",
" REMODEL | \n",
"
\n",
" \n",
" \n",
" \n",
" 3594 | \n",
" 600.1 | \n",
" 7549 | \n",
" 9637 | \n",
" 1938 | \n",
" 4631 | \n",
" 2333 | \n",
" 2.0 | \n",
" 11 | \n",
" 5 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
" 2228 | \n",
" 442.8 | \n",
" 5570 | \n",
" 4421 | \n",
" 1932 | \n",
" 2648 | \n",
" 1728 | \n",
" 2.0 | \n",
" 6 | \n",
" 3 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Recent | \n",
"
\n",
" \n",
" 3195 | \n",
" 642.7 | \n",
" 8085 | \n",
" 8415 | \n",
" 1910 | \n",
" 6442 | \n",
" 2843 | \n",
" 2.0 | \n",
" 11 | \n",
" 6 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Recent | \n",
"
\n",
" \n",
" 2453 | \n",
" 564.7 | \n",
" 7103 | \n",
" 7676 | \n",
" 1907 | \n",
" 3596 | \n",
" 2360 | \n",
" 2.0 | \n",
" 11 | \n",
" 5 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Recent | \n",
"
\n",
" \n",
" 3338 | \n",
" 540.0 | \n",
" 6793 | \n",
" 8750 | \n",
" 1940 | \n",
" 3567 | \n",
" 2342 | \n",
" 2.0 | \n",
" 11 | \n",
" 3 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T_VALUE TAX LOT_SQFT YR_BUILT GROSS_AREA LIVING_AREA FLOORS \\\n",
"3594 600.1 7549 9637 1938 4631 2333 2.0 \n",
"2228 442.8 5570 4421 1932 2648 1728 2.0 \n",
"3195 642.7 8085 8415 1910 6442 2843 2.0 \n",
"2453 564.7 7103 7676 1907 3596 2360 2.0 \n",
"3338 540.0 6793 8750 1940 3567 2342 2.0 \n",
"\n",
" ROOMS BEDROOMS FULL_BATH HALF_BATH KITCHEN FIREPLACE REMODEL \n",
"3594 11 5 1 1 1 1 NaN \n",
"2228 6 3 3 1 1 1 Recent \n",
"3195 11 6 2 1 1 1 Recent \n",
"2453 11 5 2 1 1 1 Recent \n",
"3338 11 3 2 1 1 1 NaN "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 21,
"id": "540f4ca7-109a-4693-a2eb-11678842d27b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"T_VALUE float64\n",
"TAX int64\n",
"LOT_SQFT int64\n",
"YR_BUILT int64\n",
"GROSS_AREA int64\n",
"LIVING_AREA int64\n",
"FLOORS float64\n",
"ROOMS int64\n",
"BEDROOMS int64\n",
"FULL_BATH int64\n",
"HALF_BATH int64\n",
"KITCHEN int64\n",
"FIREPLACE int64\n",
"REMODEL object\n",
"dtype: object"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 22,
"id": "f31a8946-8f55-43ce-880e-1ff0f2461d40",
"metadata": {},
"outputs": [],
"source": [
"# TAX needs to be converted to a numerical continuous variable\n",
"# LOT_SQFT needs to be converted to a numerical continuous variable\n",
"# GROSS_AREA needs to be converted to a numerical continuous variable\n",
"# LIVING_AREA needs to be converted to a numerical continuous variable\n",
"# FLOORS needs to be converted to a numerical discrete variable\n",
"# REMODEL needs to be converted to a categorical variable"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b3ec834b-ec3b-48a9-9e06-b9047de89942",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 49,
"id": "ffdb0e00-f9df-498b-8307-1ea1024da704",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"T_VALUE float64\n",
"TAX float64\n",
"LOT_SQFT float64\n",
"YR_BUILT int64\n",
"GROSS_AREA float64\n",
"LIVING_AREA float64\n",
"FLOORS int64\n",
"ROOMS int64\n",
"BEDROOMS int64\n",
"FULL_BATH int64\n",
"HALF_BATH int64\n",
"KITCHEN int64\n",
"FIREPLACE int64\n",
"REMODEL category\n",
"dtype: object"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 24,
"id": "b42a5548-70d4-4749-8b4d-5e9722489fab",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Old', 'Recent'], dtype='object')"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:base] *",
"language": "python",
"name": "conda-base-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}