Property Assessment Fairness

A Cook County housing model that treats prediction as a public-impact problem: accurate estimates matter, but so does who receives the largest errors.

0.594 holdout RMSE on log sale price
0.600 4-fold cross-validation RMSE
57.8% lower-priced homes overestimated

Problem

Property assessments influence tax bills. A model that misses unevenly can shift tax burden toward lower-priced homes even when its average error looks acceptable.

Approach

I modeled log sale price with linear regression, engineered stable property features, and evaluated both accuracy and distributional error patterns.

Outcome

The model validated consistently, but diagnostics revealed a regressive pattern: cheaper homes were more likely to be overvalued than expensive homes.

Fairness Diagnostic

Lower-price tier
57.78%
Higher-price tier
27.69%

Share of homes overestimated by the model in the original notebook run. For assessment, higher overestimation means a higher likely tax burden relative to actual sale value.

Modeling Workflow

1. Clean

Remove invalid sale prices and handle missing values without dropping test parcels.

2. Engineer

Use log building value, log building square feet, and bathroom count features.

3. Validate

Check train, holdout, and cross-validation RMSE on log sale price.

4. Audit

Compare residual behavior across price tiers using RMSE, MAPE, and overestimation rate.

Original Process

The notebook workflow is preserved as a narrative process note covering EDA, feature engineering, validation, and fairness analysis.

Read the process notes

What I Would Improve Next

Add neighborhood encodings, spatial validation, model cards, and monitoring dashboards for price-tier and community-level residual parity.

Skills Shown

Python, pandas, scikit-learn, feature engineering, model validation, fairness analysis, communication of statistical tradeoffs.

Run It

Use python tests/smoke_test.py for a local check, then python scripts/run_model.py --train data/cook_county_train.csv when the source data is available.