Problem
Property assessments influence tax bills. A model that misses unevenly can shift tax burden toward lower-priced homes even when its average error looks acceptable.
A Cook County housing model that treats prediction as a public-impact problem: accurate estimates matter, but so does who receives the largest errors.
Property assessments influence tax bills. A model that misses unevenly can shift tax burden toward lower-priced homes even when its average error looks acceptable.
I modeled log sale price with linear regression, engineered stable property features, and evaluated both accuracy and distributional error patterns.
The model validated consistently, but diagnostics revealed a regressive pattern: cheaper homes were more likely to be overvalued than expensive homes.
Share of homes overestimated by the model in the original notebook run. For assessment, higher overestimation means a higher likely tax burden relative to actual sale value.
Remove invalid sale prices and handle missing values without dropping test parcels.
Use log building value, log building square feet, and bathroom count features.
Check train, holdout, and cross-validation RMSE on log sale price.
Compare residual behavior across price tiers using RMSE, MAPE, and overestimation rate.
The notebook workflow is preserved as a narrative process note covering EDA, feature engineering, validation, and fairness analysis.
Add neighborhood encodings, spatial validation, model cards, and monitoring dashboards for price-tier and community-level residual parity.
Python, pandas, scikit-learn, feature engineering, model validation, fairness analysis, communication of statistical tradeoffs.
Use python tests/smoke_test.py for a local check, then
python scripts/run_model.py --train data/cook_county_train.csv
when the source data is available.