Property Assessment Fairness

Original Coding Process and Analysis

This page expands the original notebook workflow: human context, exploratory data analysis, cleaning decisions, feature engineering, model validation, visual diagnostics, and fairness-focused metric reflection.

2 original notebooks preserved
228 total notebook cells reviewed
10 embedded notebook visualizations extracted
0.594 holdout RMSE from the model notebook

How to Read This

The live results page gives the short recruiter version. This process page is the deeper evidence: it shows how the project moved from messy assessment data to a model audit. I kept the original notebooks in the repo, then translated the useful coding and analysis decisions into a readable web narrative.

1. Human Context and Fairness Framing

The project begins by asking who is affected by a property value prediction. Homeowners, local governments, buyers, lenders, and assessors all have a stake, but the harm is not symmetric. An overassessment can increase a homeowner's tax bill, while systematic underassessment of expensive homes can shift the public tax burden elsewhere.

My original analysis identified the key fairness risk as regressivity: lower-priced homes being overvalued while higher-priced homes are undervalued. That pattern matters because it can increase effective tax rates for working-class and historically disadvantaged neighborhoods even when the model's overall accuracy looks acceptable.

Question I Used to Guide the Work

Can a property assessment model be evaluated not only by aggregate error, but by who receives overestimates and underestimates?

Fairness Definition

A fair model should have residuals centered near zero across price tiers and communities, with similar error dispersion and no consistent overassessment of lower-priced homes.

2. Exploratory Data Analysis

The first notebook focused on understanding the dataset before fitting a model. I inspected sale price distributions, identified placeholder-like low values, and used transformations to make highly skewed variables easier to model.

Granularity

Each row represents a property record with sale, structure, location, and assessment fields. This made parcel-level prediction possible.

Data Quality

Very small sale prices, including values below $500, looked unlike ordinary market transactions and were excluded from the training target.

Skew

Sale price and assessment estimate variables were right-skewed, so log transforms were central to the modeling process.

training_data = initial_data[initial_data["Sale Price"] >= 500]
training_data["Log Sale Price"] = np.log(training_data["Sale Price"])
training_data["Log Building Square Feet"] = np.log(training_data["Building Square Feet"])

3. Original Notebook Visualizations

These figures are extracted directly from the original notebook outputs. They show the major visual checkpoints in the analysis: distribution shape, outlier handling, feature relationships, model residuals, and fairness diagnostics.

Raw sale price distribution with heavy right skew
Raw sale price is highly right-skewed. This motivated both outlier review and modeling sale price on the log scale.
Sale price distribution below one million dollars
Zooming into sales under $1,000,000 exposed the spike near very low sale prices, which looked like non-market or placeholder values.
Log sale price distribution
After filtering tiny sale prices and applying a log transform, the target distribution became more suitable for linear modeling.
Estimate land distribution before outlier removal
Land estimate values also showed strong skew, which made outlier handling and robust interpretation important.
Estimate land distribution after IQR outlier filtering
Applying an IQR-based filter made the land estimate distribution more legible for EDA.
Bathrooms versus log sale price boxplot
Bathroom count showed a positive relationship with log sale price and became one of the interpretable features in the final model.
Neighborhood code distribution and sale price
Neighborhood codes showed location signal, but also uneven sample sizes across groups. This is useful but needs careful validation to avoid overfitting location artifacts.
Residuals versus log sale price
Residuals versus log sale price showed structure rather than pure noise, which raised the question of regressive or progressive assessment patterns.
RMSE and overestimation percentage by log sale price interval
The overestimation chart was more directly useful for fairness than dollar RMSE: lower-priced bins were overestimated more often.
MAPE across log sale price intervals
MAPE emphasized proportional error, surfacing relative harm that dollar-denominated RMSE can hide.

4. Feature Engineering Decisions

The notebooks explored multiple feature families, then narrowed the portfolio model to a small, explainable feature set. This keeps the public demo readable while preserving the original reasoning in the notebooks.

Feature Area Original Analysis Reasoning
Log sale price Converted `Sale Price` into `Log Sale Price` after filtering very low values. Reduced right skew and made the linear regression target more stable.
Building size Used `Log Building Square Feet` as a size feature. Size has a clear positive relationship with price, and log scale improves linearity.
Bathrooms Extracted bathroom counts from structured fields or the `Description` text. Bathroom count is interpretable and captures a meaningful home quality signal.
Assessment estimates Used `Log Estimate (Building)` in the final model. Prior building estimate carries strong valuation signal while remaining available at prediction time.
Neighborhood Explored top neighborhoods and expensive-neighborhood indicators. Location matters, but uneven group sizes create overfitting and fairness risks.
Categorical attributes Explored substitution and one-hot encoding for fields like wall material. Categorical property details can help, but the final demo kept the model compact.
DEFAULT_FEATURES = [
    "Log Estimate (Building)",
    "Bathrooms",
    "Log Building Square Feet",
]

5. Modeling and Validation

The second notebook fits ordinary least squares linear regression using `sklearn.linear_model.LinearRegression(fit_intercept=True)`. The refactored repo keeps that model path in `scripts/run_model.py` and adds a NumPy least-squares fallback so the demo still runs when scikit-learn is not installed.

Pipeline Rules

  • Never mutate the input dataframe in place.
  • Create target variables only when training.
  • Apply outlier filtering only to training data.
  • Never drop test parcels simply because they are unusual.
  • Fill missing feature values with medians.

Notebook Validation Results

Train RMSE:     0.6002
Holdout RMSE:   0.5937
4-fold CV RMSE: 0.6002 +/- 0.0049

The train, holdout, and cross-validation scores are close, which suggests the simple model is not merely memorizing the training set. Because the target is log sale price, the RMSE should be interpreted as a multiplicative error rather than a fixed-dollar miss.

6. Fairness Findings

After fitting the model, I evaluated whether the residuals implied regressive taxation. The key move was to compare overestimation rates by price tier rather than relying only on aggregate RMSE.

Group RMSE on Sale Price Overestimated Interpretation
Lower-priced homes About $76,899 57.78% More often predicted above actual value, increasing assessment burden risk.
Higher-priced homes About $245,632 27.69% Less often overestimated, meaning expensive homes were more likely to be underpredicted.

Why This Matters

Dollar RMSE is larger for expensive homes because the houses are more expensive. But overestimation rate connects more directly to tax fairness. The model pattern aligns with a regressive assessment concern: cheaper homes are more likely to be overvalued, while expensive homes are more likely to be undervalued.

7. Metric Reflection

RMSE is useful for prediction quality, but it can hide who is harmed proportionally. In the notebook, I explored MAPE to evaluate relative error rather than raw dollar error.

def mape(theta, X, y):
    y_pred = X @ theta
    percentage_error = np.abs((y - y_pred) / y)
    return np.mean(percentage_error)

My conclusion was that no single metric is enough. A responsible assessment workflow should report aggregate accuracy, proportional error, residual parity, and over/under assessment rates across meaningful groups.

8. What the Refactor Added

The original project lived primarily as course notebooks. The portfolio refactor adds a cleaner engineering surface around that work:

Bottom Line

The project is not just “I trained a house price model.” The stronger story is: I built a valuation model, checked that it generalized, then audited whether its errors could create unfair tax outcomes. That is the version worth showing recruiters.