This project aims to better understand (1) where new units are being permitted in the City of San Francisco, and (2) evaluate if those new permits or units are related to negative demographic, economic, or cultural changes.

This report will be split into six sections:

  1. IntroductionContext, Literature Review, Methods

  2. Data CollectionPermits API Pull, (see python notebook for more data explanations)

  3. Exploratory Data AnalysisDistributions, Correlation, Variable Selection

  4. Spatial AnalysisSpatial Autocorrelation, Local Moran’s I, Cluster Analysis

  5. Linear ModelOrdinary Least Squares, Predicting 2015-19 Permits & Units

  6. ConclusionsGentrification?

This project can explore many narratives involving permits, housing, and gentrification. But for this version, I will focus on understanding Permits and Housing Units from 2015 to 2019. This is an interesting period to study as the city will have already had waves of incoming investment and high-income workers from the booming technology industry. So Sections 3-5 will focus on understanding:

  • Where is housing being permitting the most?

  • Is there a changing demographic in those areas of higher development?


1 Introduction

San Francisco and the larger Bay Area region is becoming unlivable for most of its long-term lower- and middle-class residents. Although many American cities have unaffordable housing markets, the Bay Area’s primarily due to the lack of building enough housing. It isn’t that cities don’t have enough land, the issue is simply that the cities’ zoning regulations, politicians, and citizens have restricted the ability to receive building permits. The result is the haunting, ambiguous effects of ‘gentrification’.

This project seeks to locate where housing units are actually being permitted and if those buildings are relieving the lack of housing, cost of living, and gentrification indicators. To understand if the building permits are pushing the Bay Area into a deeper stage of gentrification, the project will follow these steps:

Step 1. Gather Permits – Analyze 2000-2019 building permits for the city of San Francisco – focusing on their neighborhood location, amount of housing units, amount of affordable units, and legal status (see Section 1 or my notebook file that compiles the data)

Step 2. Analyze Changes – Determine which neighborhoods receive the most permits and if those same neighborhoods also have: * Housing Market Increases – via Home Prices, Rental Costs, Evictions * Changing Demographics – via Race, Income, Age, Employment, Education * Historically Adverse Policies – Exclusionary Zoning, Racial Discrimination (see Sections 2 & 3)

Step 3. Conclude on Housing Affects – Qualitatively & Quantitatively compare permitting, housing, and demographics between neighborhoods

In report’s conclusion It will be difficult to directly link building permits to housing costs, but the report will end by estimating the observed permit metrics’ effects on the crisis as a whole.


1.1 Literature Review

The San Francisco Bay Area is having a housing crisis and it has for a while. The core of the crisis is relatively simple. The Bay Area’s housing supply is not meeting housing demand – and it hasn’t since the 1960s.

The result of this imbalance is that the Bay Area has become the most expensive area of the United States to live. Since 1990, Bay Area counties have been at the top of the United States’ Housing Price Index counties – with at least 7 counties being in the top 10 every year (Bogin, Doerner, and Larson 2019). San Francisco, which has sat at the top of the Housing Price Index, had its housing price increase by 39% from 2010 to 2020 but only had a 7% increase in housing units (US Census Bureau 2020).

Once we start to look into, Why the Bay Area can’t build more housing? or What are the larger economic, social, and political effects?, is when this housing crisis becomes an even deeper, complicated state of gentrification, inequality, and exclusion. To understand the cause and effects of this crisis, we have to lay down the theory & studies explaining the: 1. housing economy of the bay, 2. Theoretical Stages of Gentrification 3. Measuring Stages of Gentrification, and 4. Local Governance & Permitting Process.

1.1.1 Strict Regulations & the Housing Economy

Housing demand doesn’t necessarily obey borders. If a newly hired tech worker wants to move to the San Francisco Bay Area, they will use Craigslist to find an apartment within 10 miles of their job. If a developer wants to build a multi-family house, they are at the mercy of the city government’s layers of anti-growth land use policies and public hearing bodies.

One of the larger models at play is the ‘filter model’. The housing market has new and/or high quality homes that sit the top of the market for high income people. As the homes lower in quality, more middle income people can afford the middle quality homes. At the bottom is the lowest quality housing that the lowest incomes can afford. Sometimes homes can be repaired or demolished & re-constructed to raise their quality or the amount of units in the market (O’Sullivan 2012).

With a limited availability of rental housing, the amount of homelessness has been observed to rise (Hanratty 2017). Glaesar & Gyourko, in The Economic Implications of Housing Supply, observe that stricter land use regulations come at the expense of less construction, higher home prices, and less population growth – calling out the Bay Area specifically in their study(Glaeser and Gyourko 2002).

1.1.2 Theoretical Stages of Gentrification

For this project, I will start with Ruth Glass’ definition of gentrification, “The transformation of a poor neighborhood in cities by the process of middle- and upper-income groups buying properties in such neighborhoods and upgrading them.” (Lees et al., 2008) It mostly reads as general displacement.

To add temporal levels to that definition, let’s look at Clay’s 1979 Stage Model (Clay 1979). This model provides a narrative by matching stages with the actors at fault: 1. Pioneers: Small group of Risk Oblivious Pioneers: Artists, Designers, LGBTQ 2. Expanding Gentrification: Risk Takers, & remodelers move in, start renovating buildings.
3. Displacement: middle class people start moving into neighborhood. Major changes start happening 4. Mature Gentrification: neighborhood becomes desirable, often with new resources and businesses. Often the original residents and early gentrifiers are displaced.

Clay’s model will be more difficult to measure but adds more details to the definition. The model claims that gentrification is more of a gradient from lower-income native residents up to higher-income transplants.

1.1.3 Measuring States of Gentrification

The housing crisis & the larger ‘gentrification’ have largely remained ongoing and un-addressed as this the full problem is too immense and complicated for one local government to take responsibility, measure, and absolve. Japonica Brown-Saracino, in “Explicating Divided Approaches to Gentrification & Growing Income Inequality”, emphasizes that gentrification is typically qualitatively viewed as a:

nearly unstoppable force bearing down on cities that exacerbates economic and racial inequalities and plays a prominent role in the spatial reorganization of urban life. (Brown-Saracino 2017, 516)

But how can you quantitively define or measure ‘gentrification’? Brown-Saracino and other gentrification researchers highlight that it is more difficult to quantifiably measure all of the root causes and direct/in-direct effects of gentrification(Desmond and Perkins 2016; Humphries et al. 2019). Let alone forming a policy solution to address the quantified measurements.

Mahasin Mujahid et al. (2019) were able to compare the measurement of, “Gentrification and Displacement in the San Francisco Bay Area”. Mujahid et al. specifically looked at 3 methods of measuring gentrification: the Freeman method, the Landis method, and the Urban Displacement Project’s method. I believe it is import to go through each of these methods to think about the project’s focus:

Freeman’s Method for Defining a Gentrifying Area: 1. the area was at or below the median income for its respective metropolitan area 2. the percentage of housing stock within the area built in the prior twenty years was at or below the median for the metropolitan area, 3. at least 50% of the area was defined as urban. (Freeman 2005)

Landis’ Method for Defining a Gentrifying Area: 1. 1990 median household income was in the lowest four deciles and their 2013 median household income had improved by two or more deciles (Landis 2016)

Urban Displacement Project’s Method of Defining Stages of Gentrification: 1. Rental & Housing costs received a rapid increase from 1990 to 2000 or 2000 to 2018 2. Rapid increase in income. (Thomas el al. 2020) (It’s a more nuanced method that is difficult to put into a few bullet points)

All three methods mostly revolve around dramatic increases of income and housing costs in a large time period. The Urban Displacement Project used more classifications, as well as much more metrics to compare with. With these three methodologies in mind, it provides a better lens to analyze the housing, permit, and demographic data.

1.1.4 Local Governance & Permitting Process

The issue isn’t the lack of development interest or developable area. The under-supplied housing is the result of exclusionary local regulations and the difficulty of finding & forming policy that will prevent negative effects. In the Bay Area, and many other U.S. cities, the permitting process is long and expensive with many opportunities for public hearing bodies to pressure public representatives to deny projects. Besides affecting housing supply and costs, the local regulations have indirectly or directly resulted in racial discrimination, rising homelessness, and increased economic inequality.

Building Construction Permits can be notoriously difficult to receive – with many cities having systematic methods of slowing down the process. The public hearing bodies, that are supposed to provide transparency to the process, now are points where NIMBY (Not in My Backyard) residents can publicly pressure officials to legally or illegally halt the projects. The San Francisco Bay area is known for its housing crisis and difficult permitting processes.

There is existing research that analyzes the overall American building permitting process and how it contributes to the housing economy. Overall, the local government’s permitting process is purposely difficult to navigate as a result of public hearing bodies – specifically Berkeley, CA (Dougherty 2017) & San Francisco (McNee & Pojani 2021; Egan, n.d.) – and/or land use regulations (Glaeser & Gyourko 2002). Literature evaluate how public hearing bodies favor older, wealthier, and white communities.(Schaffner, Rhodes, & Raja 2020; Trounstine 2018) Some quantitative studies reinforce this claim with users surveys(Einstein 2021) and meeting minutes (Einstein, Palmer, & Glick 2019).

1.1.5 Conclusion

The literature pushes the next data-focused sections to evaluate how income, housing costs, and demographics have changed over time. I will attempt to measure these changes to changes in new permits. But it might be difficult to come to a strong conclusion.


2 Data Collection

In this section, I will be importing in the dataset containing permits, housing units, and predicting data. This data was aggregated into Block Groups in a separate Python Jupyter Notebook (which can be found here). The Notebook provides more in-depth metadata and references to the data sources. Nonetheless, I will reference the sources of the data below.

Overall, the Block Group can be viewed in 4 larger categories: 1. Building Permits (2000-19) 2. Added Housing (2000-19) 3. Demographic Metrics (1990-2018) 4. Evictions (2000-19)

I want to bring in alot of metrics for the initial analysis. But by the linear model, I will pair down to the most impactful & relevant metrics.

2.1 Permits by Block Group

I initially was scraping the city’s Accela Permit Database API for permits. Although this method gave me access to more information on permits from 2014-2019, I decided that it would be more valuable to focus on quantity rather than quality.

As a result, I found a pre-cleaned permit dataset from the SF Open Data Portal for (2000-2013 and 2013-2019). Although there is less information on the permit’s processing (e.g. permitting length), it provides me with an accurate history of development for 20 years. The full list of permit fields can be found in this spreadsheet.


# data pre-cleaned in python
data_path =
   paste(DIR, "/data/clean/sf_dataset_20220506.geojson", sep="")
sf.data =
  st_read(data_path) %>%
  st_transform(project_crs)

To start, let’s look at this map of San Francisco Building Permits by time period. With these maps, we can get a general idea of location and frequency of building permits that add in housing units. For throughout the years, permits are primarily located in the Eastern half of the city. Many blocks in the Western half of the city receive zero permits for 5 year periods. This is concerning as the development is focused in the Northeastern part of the city: the Downtown, the Tenderloin, and the Mission.

(I apologize for the lack of in-plot labels. My python label function WOULD NOT work)

pic_path = 'sf_permits_year.png'
pic_path = paste0(DIR, "/visuals/", pic_path)
knitr::include_graphics(pic_path)

2.1.1 Housing Units (2000-19)

The main focus of this project’s analysis, approved housing units, has a sharper geographic contrast. Looking at the figure below, every year shows a distinct cluster of units around Northeastern San Francisco, i.e. Downtown, Tenderloin, SOMA.

Between 2000 and 2019, the added units seems less dispersed around the city. In the 2000-04 period, a majority of block groups have at least 1 unit. In 2015-19, it appears that most of the development is concentrated in the Eastern area. But the core of the cluster (Downtown) appears to be expanding just to the neighborhoods just outside (Mission, Tenderloin).

The figure below depicts Housing Units, Permits, Average Units per Permit, and Units per Sq. Mile for 2015-19. By putting these plots together, development appears to stay primarily in the Northeastern corner.

We will evaluate this clustering more with Spatial Autocorrelation in Section 4. But for now, it appears to be concerning that new housing is not spread throughout the city. If new units have only been added to one area, that area will witness constant physical and residential change. Even though San Francisco does not have a lot of land area, it appears that many areas of the city are not open to development.

2.1.2 Demographic Metrics (2000-19)

To better understand more about the areas of more development, lets pull in metrics that describe the resident’s of the area.

2.1.2.1 Renters

The next figure locates where a higher share of renters are located in the city. Specifically, renters are primarily located in the Northeastern portion of the city. The Marina & North Beach to the north. Tenderloin, Downtown, SOMA, & Mission in the central, North Eastern area. This highlights that the higher amounts of housing developments happen in areas of higher shares of renters.

This can be seen as good and bad. Renters have more units to rent from. But the new developments have to be place somewhere. So the older apartments, that are typically lower rents, are not replaced by expensive market-rate homes.

In the next figure, it shows the change in renters from 2010 to 2018. Just in 8 years, many blocks have larger than a 50% change in their share of renters. So blue areas saw a decrease in renters, but an increase in owner-occupied units. Likely in a ‘condo-conversion’ situation, where apartments are evicted so they can be sold as individual properties.

What is concerning is that the areas of a 50% decrease in the share of renters are those areas with massive developments. So there is new developments, but they are condos. With less rental units on the market, there is more competition over the remaining units. Driving up prices.

2.1.2.2 Median Household Income (1990-2018)

The following plot digs more into the income make-up of the city. The plots depict Median Household Income, but adjusted for 2018 USD. It appears that the higher income areas have typically been outside the Downtown area. Income slightly drastically increases from 1990 & 2000 – maybe due to the 1990’s tech bubble. But the drastic increase is from 2000 to 2018. The last map in the figure highlights the changes, and appears that almost all of the city saw an increase in median household income.

An insane statistic is that the city’s adjusted median household income in 1990 was $57,956 . . . but is $110,060 in 2018. The household income DOUBLED, even considering an adjustment for inflation. The Tenderloin - the lower-income, heart of downtown - the tenderloin, appears to only increase a small margin. But the area is surrounded by housing growth and rising income.

2.1.2.3 Change of Low-Income People of Color

The next figure shows the change in Low-Income People of Color. This is a more niche metric, but it highlights the population that suffers the most from displacement. The map highlights that the downtown area has a mixed bag of loss of PoC and Gain.

Just outside the downtown area, there is more loss of PoC. This could be due to the downtown area already seeing a large amount of displacement. So the next area for displacement is immediately outside of it.

2.1.2.4 Change of White People

The next figure contrasts with the previous plot. The downtown areas of higher development is also areas of more White people moving in.

2.1.3 Evictions (2000-19)

One immediate understanding of residential change is looking at the city’s evictions. The plot below depicts all of the city’s evictions by year. The evictions primarily are located in the center of the Northeast area – where the higher share of development, renters, and low-income residents are. This does make sense: more renters mean more of a potential for eviction. But if we recall, there are more areas that have higher shares of renters that aren’t the main block groups of the Tenderloin and Northern Mission district.

OVer time, it appears that there is less evictions? There was a spike of evictions in the 2000-04 period. This could suggest there was a surge of development, which could warrant more evictions.

Let’s breakdown the evictions by type in order to get a better idea of the pattern. The types breakdown by * Renters Breaching the Lease - e.g. illegal roommate * Owner Changing the Housing Use - e.g. Condo Conversion * Major Renovations * Failure of Renter Payment

The maps highlight where each is located between 2000 to 2019. The most common type of eviction appears to be a ‘breach of lease’. What’s interesting, is that breach of lease is evenly spread across the city. But evictions due to a change of use, a major renovation, and a failure to pay have higher concentrations in the Northeast or central area.

## Data Conclusion

There are other metrics that were not visualized, but the previous maps highlight the larger, narrative metrics. The following sections will work through some of the other metrics before placing them in the model.

To conclude our current findings, the downtown area witnesses the highest share of permitting and housing – even when weighing them. At the same time, the downtown area has the highest share of renters, long-term low income residents, and the greatest increase of white people. It is concerning that housing is mainly being added to this area and, to a larger extent, not elsewhere in the city.


3 Exploratory Data Analysis

In this section, we will further evaluate the metrics to understand their distributions and relationship to housing units.

3.1 Scatter Plots

Let’s better understand some main varaible’s relationships to housing units. Specifically, (1) Permits, (2) Evictions, (3) Change in Percent of Units that is Owner-Occupied, and (3) Percent White.


vars.ivs.disc = var.ivs.all[grepl(".pct.", var.ivs.all)]
vars.ivs.cont = var.ivs.all[!grepl(".pct.", var.ivs.all)]

fig_num = 2

var.dvs.plot = c(
  'units.tot.15_19',
  'permits.tot.15_19'
  )

dependent_variable = 'units.tot.15_19'
dv_lab = "Units"

plot_colors= c(
  "blue", "green", "red", "orange"
)

n_max = function(vector, n){
  new_vector = sort(vector, decreasing=TRUE)
  return(max(tail(new_vector, -1*(n))))
}
#n_max(c(5,4,69,100,3,2,1),2)

second_max = function(vector){
  return(n_max(vector,1))
}
second_max(c(5,4,69,100,3,2,1))

ylim_num = max(sf.predict[[dependent_variable]])
ylim = max(sf.predict[[dependent_variable]])# 
ylim = n_max(sf.predict[[dependent_variable]], 2)

scat_vars = c('permits.tot.15_19', 'evictions.tot.10_19', 'occ_own_ch.pct.90_18','wht.pct.10_15')
scat_labs = c("Permits 2015-19", "Evictions 2010-19", "Change in Pct Owned 1990-2018", 'Pct White 2010-15')

plot_scats = function(variable_number){
  plot_color = plot_colors[variable_number]
  variable_name = scat_vars[variable_number]
  variable_label = scat_labs[variable_number]
  fm_equation = paste(dependent_variable, "~", variable_name, sep="")
  
  fm = as.formula(fm_equation)
  units.tot.15_19_variable = lm(fm, data = sf.predict)
  coefficient = 
    round(
      units.tot.15_19_variable$coefficients[variable_name][1], 2)
  
  xlim = n_max(sf.predict[[variable_name]], 2)
  
  scat_plot = 
    ggplot(
      data = sf.predict,
      aes(
        x = sf.predict[[variable_name]],
        y = sf.predict[[dependent_variable]])) +
    geom_point(size=2, shape=20) +
    labs(title = 
           glue("Figure {fig_num}.{variable_number}: {variable_label}"),
         subtitle = glue("{fm_equation}     Coefficient = {coefficient %>% count_format()}")
         ) +
    geom_smooth(method = "lm", se=F,
                color = plot_color) +
    xlab(variable_name) +
    ylab(dv_lab) +
    xlim(min(sf.predict[[variable_name]]), xlim) + 
    ylim(min(sf.predict[[dependent_variable]]), ylim) + 
    plotTheme()
  print(scat_plot)}

The first plot shows housing units as they relate to permits. The coefficient of the two variables suggests that there is a positive relationship. More permits, more units. This makes sense. Another interesting pattern is that some of the block groups with the most units, don’t necessarily have the highest amount of permits. Suggesting that there are larger projects that generate more units.

plot_scats(1)
## `geom_smooth()` using formula 'y ~ x'

The next plot highlights housing units over evictions. The coefficient suggests a positive relationship, but it is close. Many block groups that added alot of units didn’t have alot of evictions. This could suggest that those with alot of new units had more evictions previously. But based on the maps in the last sections, there are alot of evictions spread throughout the city.

plot_scats(2)
## `geom_smooth()` using formula 'y ~ x'

Figure 2.3 shows the relationship in the “Change of Percent Owned from 1990 to 2018” and New Housing Units. It appears to be a positive coefficient. The more units added, the larger share in owner-occupied units. This could suggest that the new units added are likely condos. This is concerning if the share of apartment units are decreasing do to these new condos.

plot_scats(3)
## `geom_smooth()` using formula 'y ~ x'

Figure 2.4 shows the change in pct white from 2010 to 2015. In the related map in the previous section, it was visually apparent that the block groups with a growth in White people also saw more housing units added. This scatterplot suggests a similar conclusion. Although many block groups saw a loss in white people, the ones with added units saw larger increases.

plot_scats(4)
## `geom_smooth()` using formula 'y ~ x'

3.2 Correlation

Correlation Matrices, Independence Test

Before performing any regression, it will be important to understand the linear relationship between (1) the predictors & themselves, and (2) the predictors and housing units. To understand this, we will use a Pearson Correlation matrix to weed out weak predictors and any predictors that are too related to one another.

The correlation plots are split in two so that all of the metrics can be viewed. I also already removed alot of metrics that were too similar. We want to prevent multicollinearity and weak predictors.



remove_col = c(
  'hinc.tot.18',
  #'units.tot.00_14',
  'units.tot.00_04',
  'units.tot.05_09',
  'units.tot.10_14',
  'permits.tot.00_04',
  'permits.tot.05_09',
  'permits.tot.10_14',
  'hh.tot.18',
  "permits.tot.00_14",
  #"breach_lease.tot.10_19",
  #"change_use.tot.10_19",
  "reno.tot.10_19",
  "renter_payment.tot.10_19",
  "evictions.tot.10_19",
  #"evictions.tot.00_19"
  "pop.tot.18",
  'white.tot.00',
  'hh.tot.90',
  'mov.pct.12',
  'inc.med.12',
  'hh_costs.med.18',
  'hh_inc.med.90',
  'mhval_ch.tot.12_18',
  'occ_own.pct.90',
  'occ_own.pct.18',
  'occ_rent.pct.18',
  'FAR.avg.2019',
  'pop.tot.00',
  'hh_inc.med.00',
  'occ_rent_ch.pct.90_18',
  'occ_own_ch.pct.90_18',
  'occ_rent.tot.18',
  'occ_rent_ch.pct.00_18',
  'lowinc.pct.00',
  'area_res.pct.2019',
  'units.tot.00',
  'income.med.18',
  'rent.med.90',
  'hval.med.00',
  'rent.med.00',
  'hh.tot.00',
  'rent.med.90',
  'mhval.tot.18',
  'white.tot.90',
  'hval.med.90',
  'pop.tot.90',
  'lotarea.avg.2019',
  'occ_own_price.med.12',
  'occ_rent_price.med.12',
  'mrent.tot.18',
  'dist_BART.tot.2020',
  'res_area',
  'units_p_parcel.avg.2019',
  'units_p_parcel_r.avg.2019',
  'lotarea_res.tot.2019',
  'lotarea.avg.2019',
  'bldgsqft.avg.2019',
  'white.tot.90',
  'rent.med.00',
  'mrent.tot.18',
  'mhval.tot.18',
  'occ_own.pct.00',
  #'college_ch.pct.00_18',
  'lowinc.pct.90'
  # 'breach_lease.tot.10_19',
  # 'white.tot.18',
  # 'mhval_ch.pct.00_18',
  # 'poc_linc.pct.10_15',
  # 'hh_inc.med.18',
  # 'mhval_ch.pct.00_18',
  # 'pop25.tot.18',
  # 'units.avg.2019',
  # 'area'
  )

vars.ivs.cont_fin = vars.ivs.cont[!(vars.ivs.cont %in% remove_col)]


vars.ivs.disc_fin = vars.ivs.disc[!(vars.ivs.disc %in% remove_col)]


len = length(vars.ivs.cont_fin)
len_half = round(len/2)

var.dvs.plot = c(
  'units.tot.15_19',
  'permits.tot.15_19'
  )

sf.cont = 
  sf.predict %>%
  select(c(var.dvs.plot, vars.ivs.cont_fin)) %>% 
  st_drop_geometry(.)
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var.dvs.plot)` instead of `var.dvs.plot` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(vars.ivs.cont_fin)` instead of `vars.ivs.cont_fin` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

plot = ggcorr_full(sf.cont,
            title = glue("Figure {fig_num}: Pearson Correlation"),
            subtitle = "Continous Variables"
)
print(plot)

From this first correlation plot, we can see some interesting relationships. Permit & Unit counts aren’t necessarily the most positive relationship – especially considering the scatter plots. Instead the change in white people is the highest correlation (wht.pct.10_15). Then the pct of people who have recently moved (mov.pct.18). An interesting negative correlation is the percent of building square footage that is residential (bsqft_res.pct.2019). An interesting null correlationship is the percent of people who have a college degree. I would imagine it is due to their being college educated people who own houses in the suburban outskirts of SF as well as the downtown.

sf.cont = 
  sf.predict %>%
  select(c(var.dvs.plot, vars.ivs.disc_fin[!(vars.ivs.disc_fin %in% vars.ivs.cont_fin)])) %>% 
  st_drop_geometry(.)


plot = ggcorr_full(sf.cont,
            title = glue("Figure {fig_num}: Pearson Correlation"),
            subtitle = "Continous Variables"
)
print(plot)

The second correlation plot has more lower powered correlations. None of the predictors stick out.

The next section will look more into the clustering effect of housing and permits.


4 Spatial Analysis

In this section, I want to find out where there are clusters of new permits and housing units.

First, I will take advantage of Local Indicators of Spatial Autocorrelation to determine if there are statistical significance clusters of new permits and units. Secondly, I will compared clusters between different time periods. Finally, I will look to see if predictor variables are different inside and outside of clusters

4.1 Spatial Autocorrelation

The best way to understand how geographic the permit data set is, is to get different metrics of Spatial Autocorrelation. Simply, these metrics understand if the housing’s values are correlated to the housing’s location or if they are randomly located.


# process borrowed from MUSA 508's (Fichman & Harris) Lab 8, Public Policy Analytics (Steif et al) Book Chapter 4, and David O'Sullivan's Berkeley & VUW courses
## https://github.com/mafichman/MUSA_508_Lab
## https://urbanspatial.github.io/PublicPolicyAnalytics/intro-to-geospatial-machine-learning-part-2.html
## https://github.com/DOSull/GISC-422




st_weights = function(focus_sf, queen=TRUE){
  focus_sf %>% 
    # get neighbors
    poly2nb(as_Spatial(.), queen=queen) %>%
    # neighbor weights
    nb2listw(., style="W", zero.policy=TRUE)
}

sf.predict.weights = 
  st_weights(sf.predict)

# sf.predict = 
#   sf.predict %>%
#   mutate(
#     units1519.lag = 
#       lag.listw(sf.predict.weights, units1519.predict),
#     units1519.error.lag = 
#       lag.listw(sf.predict.weights, units1519.error)
#   )

4.1.1 Moran’s I

This is a good metric to start with as Moran’s I will give us an overall measurement spatial association between value and location. To obtain Moran’s I, we runs 1000 permutations that randomly assigns the Block Groups new neighbors. The metric ultimately measures how different the values of observed neighbors are compared to the simulated neighbors.

Figure 2.1 displays Moran’s I for Observed & Simulated values of Housing Permits & Units of 2010 to 2015. The grey histogram is the distribution of Moran’s I of 1000 randomized versions of Block Groups. Their Moran’s I is close to zero as their is little spatial relationship between neighbor’s and their values because they were randomly created. The Red line is the Observed Moran’s I – which if right of the simulated distribution suggests that there is a spatial relationship between neighbors and their values.


fig_num = 2.1
fig_title = glue("Figure {fig_num}: Observed and Simulated Moran's I")

plot_moransI = function(
  focus_col, focus_sf=sf.predict, focus_weights=sf.predict.weights,
  moransIcolor = "red", samplecolor="lightgrey", title = 'Housing', xlab="Moran's I"
  ){
  focus_sf.MoransI = 
    moran.mc(
      focus_sf[[focus_col]],
      focus_weights, 
      nsim = 999
      )
  
  morans_plot = 
    ggplot(
      focus_sf.MoransI$res[c(1:999)] %>% as.data.frame(), 
      aes(focus_sf.MoransI$res[c(1:999)])
    ) +
    geom_histogram(binwidth = 0.01, color=samplecolor,fill='lightgrey') +
    geom_vline(
      aes(xintercept = focus_sf.MoransI$statistic), 
      colour = moransIcolor,size=1) +
    #scale_x_continuous(limits = c(-1, 1)) +
    labs(
      title=title,
      #subtitle= "Observed Moran's I in orange",
      x=xlab,
      y="Count") +
    plotTheme()
  return(morans_plot)
  }


do.call(
  grid.arrange,
  c(
    list(
      plot_moransI(
        'permits.tot.15_19', 
        title='Housing Permits 2015-19', 
        samplecolor='royalblue', xlab=""),
      plot_moransI(
        'units.tot.15_19', 
        title='Housing Units 2015-19', 
        samplecolor='salmon')
    ), 
    ncol = 1, 
    top = fig_title
  ))

What we can take out of this plot is that the spatial relationship is stronger between the amount of housing units than new permits in Block Groups. This suggests that new permits are more evenly distributed across the city. The amount of new housing units are more clustered.

4.2 Local Moran’s I

Let’s try to determine where this clustering is happening by getting the Local Moran’s I of each Block Group.


get_local_morans = function(focus_sf, focus_col, focus_weights){
  localmoran(
    focus_sf[[focus_col]], 
    focus_weights, zero.policy=TRUE) %>% 
  as.data.frame()
}

st_local_morans = function(focus_sf, focus_col, focus_weights, p_value=0.05){
  # get local morans I for each area
  focus_localmorans = 
    get_local_morans(focus_sf, focus_col, focus_weights)
  # join local Moran's I results to fishnet
  focus_localmorans_sf = 
    cbind(
      focus_localmorans, 
      focus_sf %>% 
        as.data.frame()) %>% 
    st_sf() %>%
    dplyr::select(
      focus_col = focus_col,
      local_morans_i = Ii,
      pvalue = `Pr(z != E(Ii))`) %>%
    mutate(
      hotspots = ifelse(
        pvalue <= p_value, 1, 
        #ifelse(pvalue >= (1-p_value), -1,0)
        0
        ))
  return(focus_localmorans_sf)
}

cols_addprefix = function(focus_df, prefix, sep='.'){
  cols = colnames(focus_df)
  cols[cols!='geometry'] = 
    sapply(cols[cols!='geometry'], function(c) paste(prefix, c, sep=sep))
  return(cols)
}
rename_addprefix = function(focus_df, prefix, sep='.'){
  cols = cols_addprefix(focus_df, prefix, sep=sep)
  colnames(focus_df) = cols
  return(focus_df)
}


var.dvs.plot = c(
  'units.tot.15_19',
  'permits.tot.15_19'
  )

pval = .05

focus_sf.permits.localmorans = 
  st_local_morans(
    sf.predict, 'permits.tot.15_19', 
    sf.predict.weights, p_value=pval) %>% 
  rename_addprefix(., 'permits1519')
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(focus_col)` instead of `focus_col` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
  
focus_sf.units.localmorans = 
  st_local_morans(
    sf.predict, 'units.tot.15_19', 
    sf.predict.weights, p_value=pval) %>% 
  rename_addprefix(., 'units1519')


plot_localmorans_all = function(
  focus_localmorans, ptitle="Local Morans I",
  mainlab='Observed Values',
  labs = c('Local Morans I', 'P-Values', 'Significant Clusters (p>.05)'),
  mainpal='YlOrRd',
  pals = c('Purples', 'PiYG', 'Reds')
  ){
  pals = c(mainpal,pals)
  labs = c(mainlab, labs)
  focus_localmorans_plot = 
    focus_localmorans %>% 
    gather(Variable, Value, -geometry)
  ## This is just for plotting
  vars = unique(focus_localmorans_plot$Variable)
  varList = list()
  
  for(i in seq_along(vars)){
    var = vars[i]
    pal = pals[i]
    lab = labs[i]
    plot_df = focus_localmorans_plot %>%
            filter(Variable == var)
    val_len = length(unique(plot_df$Value))
    pal = 
      brewer.pal(
        ifelse(
          val_len>9, 9, val_len),
                 name=pal)
    if (val_len<9){pal[1]='#FFFFFF'}
    if (i==2){
      pal[1]='#FFFFFF'
      if(mainpal!='YlOrRd'){
      pal[2]='#FFFFFF'
      pal[3]='#FFFFFF'
      pal[4]='#FFFFFF'
      pal[5]='#FFFFFF'}
      }
    varList[[i]] = 
      ggplot() +
      geom_bound(fill='lightgrey', color=NA) + 
      geom_sf(
          data = plot_df,
          aes(fill = Value), 
          colour='lightgrey', 
          size=.01
          ) +
      
      geom_quad(size=4) + 
      #geom_bound(fill=NA, color='black',lw=.4) + 
      scale_fill_gradientn(colors = pal) + 
      #dist.labels.w + 
      labs(title=lab) +
      mapTheme() #+
      #theme(legend.position="bottom")
    }
  
  do.call(grid.arrange,c(varList, ncol = 2, top = ptitle))
  }

Figures 2.2 & 2.3 show the 4 step process of finding Local Indicators of Spatial Autocorrelation – specifically the clusters. The steps & maps show how: 1. we find the observed values (e.g. Permit Counts), 2. get the Moran’s I of each Block Group, 3. get the significance (p-value) of that Moran’s I compared to the simulations, and 4. find clusters of block groups that have a high significance (p>.05) – meaning that they are in the top 5%.


sf.permits.hotspots =
  cbind(
    st_local_morans(
      sf.predict, 'permits.tot.00_04', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'permits0004') %>%
    select(permits0004.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'permits.tot.05_09', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'permits0509') %>%
    select(permits0509.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'permits.tot.10_14', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'permits1014') %>%
    select(permits1014.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'permits.tot.15_19', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'permits1519') %>%
    select(permits1519.hotspots)
  ) %>%
  st_sf()
  
  
sf.units.hotspots =
  cbind(
    st_local_morans(
      sf.predict, 'units.tot.00_04', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'units0004') %>%
    select(units0004.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'units.tot.05_09', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'units0509') %>%
    select(units0509.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'units.tot.10_14', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'units1014') %>%
    select(units1014.hotspots) %>% 
      st_drop_geometry(), 
    
    st_local_morans(
      sf.predict, 'units.tot.15_19', 
      sf.predict.weights, p_value=pval) %>% 
    rename_addprefix(., 'units1519') %>%
    select(units1519.hotspots)
  ) %>%
  st_sf()

plot_localmorans_hotspot = function(
  focus_localmorans, ptitle="Local Morans I",
  labs = c('2000-04', '2005-09', '2010-14', '2015-19'),
  pals = c('RdPu', 'PuRd',  'YlOrBr', 'Reds'),
  limits=TRUE, buffer=200, 
  ranks = c(1, 2, 3)
  ){
    
  focus_localmorans_plot = 
    focus_localmorans %>% 
    gather(Variable, Value, -geometry)
  
  focus_limits_sf = focus_localmorans_plot
  if (limits == TRUE){
    focus_limits_sf = 
      focus_limits_sf %>%
        filter(Value==1)}
  focus_limits_sf = 
    focus_limits_sf %>%
      st_union() %>%
      st_sf()
  
  ## This is just for plotting
  vars = unique(focus_localmorans_plot$Variable)
  varList = list()
  
  for(i in seq_along(vars)){
    var = vars[i]
    pal = pals[i]
    lab = labs[i]
    plot_df = focus_localmorans_plot %>%
            filter(Variable == var)
    val_len = length(unique(plot_df$Value))
    pal = 
      brewer.pal(3, name=pal)
    pal = c('#FFFFFF', pal[3])
    varList[[i]] = 
      ggplot() +
      geom_bound(fill='lightgrey', color=NA) + 
      geom_sf(
          data = plot_df,
          fill='#FFFFFF',
          colour=NA
          ) +
      geom_sf(
          data = plot_df,
          aes(fill = Value), 
          colour='lightgrey', 
          alpha=.8,
          size=.2
          ) + 
      #geom_quad(size=4) + 
      geom_nhood(ranks=ranks) +
      geom_bound(fill=NA, color='black',size=.5, alpha=.5) + 
      scale_fill_gradientn(colors = pal, guide="none") + 
      #dist.labels.w + 
      plot_limits(data=focus_limits_sf, buffer=buffer) + 
      labs(title=lab) +
      mapTheme()
    }
  
  do.call(grid.arrange,c(varList, ncol = 2, top = ptitle))
}

4.2.1 Permit Count Clusters

Figure 2.2 shows the process of finding clusters of high Permit Counts for 2015 to 2019. The first plot shows simple counts of permits, with Downtown/SOMA (NE), Bayview (SE), and general eastern portion of San Francisco see the highest permits. The Local Moran’s I highlights that those block group’s values are spatially correlated. The P-Value map provides a measure of that significance. Notice the higher permit count clusters are purple but the low counts are green. The final significant cluster map isolates the significantly highest clusters of Downtown and Portola/Bayview areas.


fig_num = 2.2
fig_title = glue("Figure {fig_num}: Local Indicators of Spatial Autocorrelation, Permits 15-19")

plot_localmorans_all(
  focus_sf.permits.localmorans, mainpal='YlGnBu',
  mainlab = "Observed Permit Counts",
  ptitle=fig_title
  )

Figure 2.3 shows the history of the Permit Clusters from 2000 to 2019. The higher permit clusters aren’t as consistent as housing clusters. There is a trend of Downtown (NE) and Portola (SE) seeing clusters of high amounts of permits. By 2015-19, the clusters are larger and distinctly in the Downtown & Portola areas – suggesting that there is more permitting in those areas or the rest of San Francisco has less permitting.


fig_num = 2.3
fig_title = glue("Figure {fig_num}: Significant Clusters of Permits Counts")

plot_localmorans_hotspot(sf.permits.hotspots, ptitle=fig_title, ranks=c(1,2))

4.2.2 Housing Unit Count Clusters

Figure 2.4 provides more understanding of the count of housing units. What we find is that the clustering is more focused on the Downtown, SOMA, & Tenderloin areas (NE). The P-Value map shows that the higher rate of housing is significantly clustered in the downtown.

What’s odd is that the Bayview area (SE) that had high permit clusters in Figure 2.2, actually have lower housing p-values. Suggesting that Southeastern areas have a lot of new permits but not adding that much housing units.

Another interesting trend is that the areas immediately outside of the clusters have low p-values and lower housing units added. So Downtown, the Tenderloin, and the Mission can add new housing – but elsewhere has a cap.

fig_num = 2.4
fig_title = glue("Figure {fig_num}: Local Indicators of Spatial Autocorrelation, Permits 15-19")

plot_localmorans_all(
  focus_sf.units.localmorans, 
  ptitle=fig_title
  )

Figure 2.5 provides a time series of housing clusters. Mostly suggesting that new housing is significant in Downtown, the Tenderloin, and SOMA. One very interesting change is that the cluster is spreading less into North Beach, but more into the Mission and Hayes Valley.

The Mission is qualitatively been deemed the ‘battleground of gentrification’ in the last decade as it is a powder keg of displacement: low-income, immigrants, large supply of rental housing, nearby BART, and just outside the major employment centers of SOMA, Mission Bay, & Downtown.

Figure 2.5 highlights how the cluster of new housing units has changed over time. If we remember from the original housing unit maps in Section 2, the new housing units were spread throughout the city. These cluster maps show that the core cluster is getting larger. The there is a grown inequality of areas that add units in the city. Outside of the downtown area, there is less housing added.

What is concerning is that even more units are being added to the Tenderloin and Mission districts. The two highest concentrations of low-income renters in San Francisco. Sections 2 & 3 informed us that there is a decrease in bother renters and low-income people in the downtown areas. So the new housing units might be displacing the remaining residents in favor of owner-occupied condos, rather than relieving the shortage of rental units.


fig_num = 2.5
fig_title = glue("Figure {fig_num}: Significant Clusters of Housing Unit Counts")

plot_localmorans_hotspot(
  sf.units.hotspots, ptitle=fig_title,
  buffer = 1000
  )

4.3 Comparing Clusters to Predictors

To better understand this ‘powder keg’, lets compare the demographics inside and outside of these clusters.

4.3.1 T-Test & Mann-Whitney-Wilcoxon

In Table 2, it displays two independence tests measuring the predictor variable’s binary groups separation.

The first test is the Two Independent Samples T-Test which measures the means, standard deviation, & number of observations of the two binaries. The T-Statistic is the ratio of difference in the means – higher suggests the means are separate, lower suggests they are similar. A small p-value (<.001) suggests that the statistic is valuable.

The second test is the Wilcoxon rank-sum / Mann-Whitney-Wilcoxon (MWW) test which measures if the two binary populations of the binary come from distinct populations. It is important to use the MWW test as it doesn’t assume the variables are normally distributed. This is helpful as the probability distribution suggests many of the variables have skewed distributions. For the MWW Statistic, significantly separate values have low P-values (<0.05).

4.3.1.1 Differences Inside & Outside of the High Permits Cluster

There’s alot to pull from Table 2: difference between variables in and out of the high permits cluster. The T-Test & MWW statistics show that there are large demographic differences in areas with higher amounts of permits. Inside areas with higher of permits there are:

  • Less College Educated People
  • Less White People (in 1990 & 2018) by Percent & Total
  • More White People moving in between 2010 and 2015

This suggests there is a demographic change for areas with more permits. But as stated with the Moran’s I, permits are more evenly dispersed so there isn’t as much contrast inside and outside of the cluster.


var.dv = 'permits1519.hotspots'

var.ivs = c(vars.ivs.disc_fin, vars.ivs.cont_fin)
cont_vars = var.ivs
focus_df = sf.predict %>%
    select(c(
      var.dv,
      cont_vars
      ))
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var.dv)` instead of `var.dv` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(cont_vars)` instead of `cont_vars` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var.dv)` instead of `var.dv` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ttest_df = data.frame()

# focus_df %>% gather() %>%
#   group_by(key, value) %>%
#   summarize(counts=n())
# 
# focus_df %>% select(cont_vars, var.dv) %>%
#     filter(.data[[var.dv]]==0)

for (cont_var in cont_vars){

  bin_df = focus_df %>% 
    select(cont_var, var.dv) %>%
    filter(!is.na(cont_var))
  
  bin_FALSE = bin_df %>%
    filter(.data[[var.dv]]==0) %>%
    pull(cont_var)
  bin_TRUE = bin_df %>%
    filter(.data[[var.dv]]==1) %>%
    pull(cont_var)
  
  bin_test =
    t.test(bin_FALSE, bin_TRUE, alternative = "two.sided")

  tstat = bin_test$statistic %>% as.numeric() %>% round(2)
  doff = bin_test$parameter %>% as.numeric() %>% round()
  pval = bin_test$p.value %>%
    round_thresh(., thresh = .001,  digits = 3)
  mean_FALSE =
    mean(bin_FALSE, na.rm=TRUE) %>% round(1)
  mean_TRUE =
    mean(bin_TRUE, na.rm=TRUE) %>% round(1)

  ttest_df = rbind(
    ttest_df,
    c(cont_var, mean_FALSE,mean_TRUE,tstat,doff,pval)
  )}
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(cont_var)` instead of `cont_var` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(cont_var)` instead of `cont_var` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

colnames(ttest_df) =
  c('iv', "FALSE", "TRUE",
    "Statistic", "DF", 'P-Value')


tab_num = 2
title = glue("Table {tab_num}: T-Test & MWW of Binary Variables")

housing.cont.MWW = 
  df_corr_data(
    dv = 'permits1519.hotspots',
    ivs = cont_vars,
    test = 'MWW'
  )

new_cols = c('iv', "Outside Cluster", "Inside Cluster",
    "Statistic", 'DF', 'P-Value', 'Statistic', 'P-Value')

housing.cont.ttest = 
  ttest_df
  
 merge(
    housing.cont.ttest,
    housing.cont.MWW,
    on='iv',
    sort=FALSE
    ) %>%
  mutate(
    MWW = MWW %>% as.numeric() %>% count_format()
  ) %>%
  kable(., caption = title, label = NA,
        col.names = new_cols) %>%
  kable_styling() %>%
  add_header_above(
    header=
      c(' '=1,'Mean'=2,'T-Test'=3, 'MWW'=2)
  ) %>%
  kableExtra::footnote(
    alphabet =
      c(
        "Two independent samples t-test",
        "Wilcoxon rank sum test",
        "significance level: 0.05"
               ))
Table 2: T-Test & MWW of Binary Variables
Mean
T-Test
MWW
iv Outside Cluster Inside Cluster Statistic DF P-Value Statistic P-Value
poc_linc.pct.10_15 36.7 172.7 -1.7 18 .105 1,233 .166
wht.pct.10_15 -63.8 164.9 -1.92 18 .071 961 .009
lowinc.pct.18 0.4 0.4 -1.16 20 .259 1,247 .187
college.pct.18 0.6 0.4 2.81 20 .011 2,158 .005
mov.pct.18 546.4 532.6 0.08 18 .935 2,014 .032
mhval_ch.pct.00_18 -515.9 1.2 -1.08 170 .281 1,249 .19
minc_ch.pct.00_18 0.5 0.5 -0.19 22 .85 149 .826
rent_ch.pct.12_18 0.8 0.9 -0.74 19 .471 1,384 .485
occ_own_ch.pct.00_18 0.2 0.2 0.11 23 .916 1,657 .595
bsqft_res.pct.2019 68.7 53.8 2.02 19 .058 1,972 .05
permits.tot.15_19 3.6 7.1 -1.74 20 .097 104 .021
units.tot.00_14 192.8 1083.6 -1.7 17 .106 7 < .001
breach_lease.tot.10_19 48.4 37.2 0.91 20 .373 2,128 .008
change_use.tot.10_19 9.4 29.2 -0.82 17 .424 184 .172
evictions.tot.00_19 172.1 150.9 0.42 18 .678 211 .01
white.tot.18 1919.8 1095.4 3.14 21 .005 2,276 < .001
pop25.tot.18 3628.6 3429.4 0.39 19 .701 193 .077
occ_own.tot.18 720.6 648.6 0.62 21 .54 1,709 .443
hh_inc.med.18 108094 95077.2 1.07 20 .299 185 .16
units.tot.2019 1910.7 1802.4 0.25 18 .805 2,154 .005
units.avg.2019 4.9 4.2 0.52 22 .607 1,857 .15
bldgsqft.tot.2019 2827897 3032041.9 -0.28 22 .779 1,725 .401
area 0.2 0.3 -1.77 19 .094 1,093 .044
pop_dens.tot.2018 55055.4 20580.1 1.57 176 .118 221 .002
pop_dens_res.tot.2018 218693902880.7 698.2 1 170 .319 1,462 .731
hh_inc_ch.med.90_18 0.5 0.5 -0.42 20 .681 1,433 .633
a Two independent samples t-test
b Wilcoxon rank sum test
c significance level: 0.05

4.3.1.2 Differences Inside & Outside of the High Added Housing Cluster

Table 3 adds to Table 2 by showing the difference inside and outside of the cluster of high housing added. Inside the cluster of higher amounts of housing added, there is sharper differences in building form and demographics.

Since the cluster is largely downtown, it makes since that there is: * higher amounts of units being added (97.7 new units vs 1,643.9 new units) * closer to Bay Area Regional Transit stations * less people who own a house

Inside of the cluster, there is a different and changing demographic with:

  • higher share of those who have recently moved
  • less people who are college educated
  • higher amounts of white people outside the cluster in 1990 & 2018
  • but the change in percent white is increasing more inside the cluster
  • in 1990, there were more low income and people of color inside the cluster, rather than out. But 2018 saw a decrease in the share of both inside the cluster

var.dv = 'units1519.hotspots'

var.ivs = c(vars.ivs.disc_fin, vars.ivs.cont_fin)
cont_vars = var.ivs
focus_df = sf.predict %>%
    select(c(
      var.dv,
      cont_vars
      ))
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var.dv)` instead of `var.dv` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ttest_df = data.frame()

# focus_df %>% gather() %>%
#   group_by(key, value) %>%
#   summarize(counts=n())
# 
# focus_df %>% select(cont_vars, var.dv) %>%
#     filter(.data[[var.dv]]==0)

for (cont_var in cont_vars){

  bin_df = focus_df %>% 
    select(cont_var, var.dv) %>%
    filter(!is.na(cont_var))
  
  bin_FALSE = bin_df %>%
    filter(.data[[var.dv]]==0) %>%
    pull(cont_var)
  bin_TRUE = bin_df %>%
    filter(.data[[var.dv]]==1) %>%
    pull(cont_var)
  
  bin_test =
    t.test(bin_FALSE, bin_TRUE, alternative = "two.sided")

  tstat = bin_test$statistic %>% as.numeric() %>% round(2)
  doff = bin_test$parameter %>% as.numeric() %>% round()
  pval = bin_test$p.value %>%
    round_thresh(., thresh = .001,  digits = 3)
  mean_FALSE =
    mean(bin_FALSE, na.rm=TRUE) %>% round(1)
  mean_TRUE =
    mean(bin_TRUE, na.rm=TRUE) %>% round(1)

  ttest_df = rbind(
    ttest_df,
    c(cont_var, mean_FALSE,mean_TRUE,tstat,doff,pval)
  )}
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(cont_var)` instead of `cont_var` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

colnames(ttest_df) =
  c('iv', "FALSE", "TRUE",
    "Statistic", "DF", 'P-Value')


tab_num = 3
title = glue("Table {tab_num}: Housing Units Inside & Outside of the Cluster")

housing.cont.MWW = 
  df_corr_data(
    dv = 'permits1519.hotspots',
    ivs = cont_vars,
    test = 'MWW'
  )

new_cols = c('iv', "Outside Cluster", "Inside Cluster",
    "Statistic", 'DF', 'P-Value', 'Statistic', 'P-Value')

housing.cont.ttest = 
  ttest_df
  
 merge(
    housing.cont.ttest,
    housing.cont.MWW,
    on='iv',
    sort=FALSE
    ) %>%
  mutate(
    MWW = MWW %>% as.numeric() %>% count_format()
  ) %>%
  kable(., caption = title, label = NA,
        col.names = new_cols) %>%
  kable_styling() %>%
  add_header_above(
    header=
      c(' '=1,'Mean'=2,'T-Test'=3, 'MWW'=2)
  ) %>%
  kableExtra::footnote(
    alphabet =
      c(
        "Two independent samples t-test",
        "Wilcoxon rank sum test",
        "significance level: 0.05"
               ))
Table 3: Housing Units Inside & Outside of the Cluster
Mean
T-Test
MWW
iv Outside Cluster Inside Cluster Statistic DF P-Value Statistic P-Value
poc_linc.pct.10_15 28.7 208.3 -2.56 23 .018 1,233 .166
wht.pct.10_15 -76.8 221.4 -2.67 22 .014 961 .009
lowinc.pct.18 0.4 0.4 -1.12 24 .273 1,247 .187
college.pct.18 0.6 0.6 -0.2 31 .841 2,158 .005
mov.pct.18 498.7 897.1 -2.26 22 .034 2,014 .032
mhval_ch.pct.00_18 -522.1 -46.2 -0.97 169 .334 1,249 .19
minc_ch.pct.00_18 0.4 1 -2.35 23 .028 149 .826
rent_ch.pct.12_18 0.8 0.7 5.97 33 < .001 1,384 .485
occ_own_ch.pct.00_18 0.1 0.8 -2.03 21 .055 1,657 .595
bsqft_res.pct.2019 72 31.8 7.99 26 < .001 1,972 .05
permits.tot.15_19 3.1 10 -3.52 25 .002 104 .021
units.tot.00_14 97.7 1643.9 -2.99 21 .007 7 < .001
breach_lease.tot.10_19 44.5 68.6 -1.71 23 .1 2,128 .008
change_use.tot.10_19 9.3 27 -0.9 21 .378 184 .172
evictions.tot.00_19 163.3 222.2 -1.25 22 .224 211 .01
white.tot.18 1832.5 1908 -0.25 24 .803 2,276 < .001
pop25.tot.18 3544.7 4102.4 -0.93 22 .361 193 .077
occ_own.tot.18 735.6 547.7 1.25 23 .223 1,709 .443
hh_inc.med.18 107613.1 101094 0.52 24 .611 185 .16
units.tot.2019 1805.1 2623.8 -1.79 22 .087 2,154 .005
units.avg.2019 4.3 9 -2.88 26 .008 1,857 .15
bldgsqft.tot.2019 2390313.2 6316583.7 -2.1 21 .048 1,725 .401
area 0.2 0.2 -0.64 25 .528 1,093 .044
pop_dens.tot.2018 53976.6 35036.9 0.81 186 .417 221 .002
pop_dens_res.tot.2018 223932080072.6 1502 1 166 .319 1,462 .731
hh_inc_ch.med.90_18 0.5 0.6 -1.19 24 .248 1,433 .633
a Two independent samples t-test
b Wilcoxon rank sum test
c significance level: 0.05

The takeaway of the cluster exercise is that the Downtown, Mission, and Tenderloin areas see significantly higher amounts of housing being added and have a changing demographic.


5 Linear Model Results

In this section, I create a basic Ordinary Least Squares linear regression to underestand the relationship between New Housing Units (2015-19) and various predictor variables.

I will be performing a 75% split.


var.dv = "units.tot.15_19"
var.ivs = c(vars.ivs.disc_fin, vars.ivs.cont_fin)
var.ivs = var.ivs[!(var.ivs %in% remove_col)]

select_v = function(sf, variable_names=c(var.dv, var.ivs)){
  return(sf %>% st_drop_geometry(.) %>%
             select(variable_names))}

select_iv = function(sf, variable_names=var.ivs){
  return(sf %>% st_drop_geometry(.) %>%
             select(variable_names))}

inTrain = createDataPartition(
              y = sf.predict %>% pull(units.tot.15_19), 
              p = .75, list = FALSE)

sf.train = 
  sf.predict[inTrain,] %>% 
  select(c(vars.admin, var.dv, var.ivs))
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(vars.admin)` instead of `vars.admin` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(var.ivs)` instead of `var.ivs` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

sf.test = 
  sf.predict[-inTrain,] %>% 
  select(c(vars.admin, var.dv, var.ivs))

With a training set, the study can now build its initial training linear model [lm.train] based on the predictor variables constructed in Section 2


#var.ivs = var.ivs %>% list()
var.dv = var.dv

var.ivs.str = 
  do.call(paste, c(var.ivs %>% list(), collapse = "+"))

fm_equation = as.formula(paste(
  var.dv, var.ivs.str, sep="~"))

print(fm_equation)


lm.train = 
  lm(
    fm_equation, 
    data = sf.train %>% select_v(.)
    )
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(variable_names)` instead of `variable_names` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

The results of our model on the training set are presented below. The results highlight that there is a mixed bag of predictors being significant.

tab_num = 3

format_nums = function(num_input, digits = 2) ifelse(abs(num_input)>999,
                      count_format(num_input),
                      round_thresh(num_input, digits = digits,int_check = TRUE))

lm.train.summ = 
  lm.train %>%
  tidy() %>%
  transmute(
    Variable = 
      term, 
    Estimate = format_nums(estimate, digits = 3),  #%>%
      #paste('$', .),
    std.error = format_nums(std.error), # %>%
      #paste('$', .),
    t.value = format_nums(statistic),
    p.value = p.value %>% round_thresh()
  )

lm.train.summ %>%
  kable(
    label = NA,
    caption = glue('Table {tab_num}: Training Model Summary'),
    align = 'lrrrr') %>%
  kable_styling()
Table 3: Training Model Summary
Variable Estimate std.error t.value p.value
(Intercept) -443 313 -1 .159
poc_linc.pct.10_15 -0.004 .1 -0.04 .972
wht.pct.10_15 .06 .11 .55 .581
lowinc.pct.18 661 339 2 .053
college.pct.18 116 248 .47 .64
mov.pct.18 .126 .08 2 .103
mhval_ch.pct.00_18 -0.095 .11 -0.89 .376
minc_ch.pct.00_18 56 26 2 .034
rent_ch.pct.12_18 -193 163 -1 .238
occ_own_ch.pct.00_18 199 34 6 < .001
bsqft_res.pct.2019 -0.266 1 -0.2 .842
permits.tot.15_19 7 2 3 .001
units.tot.00_14 .272 .05 5 < .001
breach_lease.tot.10_19 1 1 1 .246
change_use.tot.10_19 -2 3 -0.86 .394
evictions.tot.00_19 .367 .49 .75 .453
white.tot.18 -0.043 .04 -0.97 .332
pop25.tot.18 .015 .04 .42 .674
occ_own.tot.18 .073 .08 .87 .388
hh_inc.med.18 < .001 .72 .471
units.tot.2019 -0.066 .07 -1 .314
units.avg.2019 -2 5 -0.3 .768
bldgsqft.tot.2019 < .001 < 2 .015
area -15 164 -0.09 .928
pop_dens.tot.2018 -0.002 -1 .148
pop_dens_res.tot.2018 .001 .06 .03 .98
hh_inc_ch.med.90_18 167 8 2 .04

Table 3 shows that the most significant variables are Building Area that is residential (bsqft_res.pct.2019), the count of permits (permits.tot.15_19), the change in percent that’s white (wht.pct.10_15), the change in median income from 2000 to 2018 (minc_ch.pct.00_18), the median home value in 1990 (hval.med.90), and the change in percent that is owner occupied (occ_own_ch.pct.00_18). This is very interesting findings as it suggests that areas with more added housing also have higher changes in income, percent of renters, home values, and the percent of white people.


tab_num = 4

title = glue('Table {tab_num}: Training Model Fit & Significance Terms')

lm.train %>% glance() %>%
  transmute(
    `RSE` = format_nums(sigma), #%>%
      #paste('$', .),
    `df ` = format_nums(df.residual),
    `Multiple` = format_nums(r.squared, digits=4),
    `Adjusted` = format_nums(adj.r.squared, digits=4),
    `stat` = format_nums(statistic, digits=1),,
    `df` = format_nums(df),
    p.value = p.value %>% round_thresh()
  ) %>%
  kable(
    label = NA,
    caption = title,
    align = 'lrrrr') %>%
  kable_styling() %>%
  add_header_above(
    header = c(
      'Residual Standard Error'=2,
      'R-squared' = 2,
      'F-statistic' = 2,
      ' ' = 1
    ))
Table 4: Training Model Fit & Significance Terms
Residual Standard Error
R-squared
F-statistic
RSE df Multiple Adjusted stat df p.value
159 117 .8918 .8678 37 26 < .001

Table 4 also highlights that the R-Squared is fairly high at .85 – suggesting that our model has a vairly solid predicting power.

5.0.1 Errors

Figure 5 shows the different errors of the model. The plots suggest that a small section of the block groups have extremely high errors while others are fairly normal. The graphs suggest that block groups with very little added units have the highest errors. This could be due to some block groups having a smaller population of housing units and people. It could also be that those block groups have similar demographics and characteristics to those of high units added . . . but don’t have any units being added.


fig_num = 5

title = glue('Figure {fig_num}: Observed Added Units & Test Error')

m = 1000000

# get lm of Absolute Error & Price
geom_vline_lim = function(xintercept=1, ylim=c(1,Inf), ...) geom_line(data = data.frame(x = c(xintercept, xintercept), y = ylim), aes(x = x , y = y), ...)

# two means of ME
ME_2 = sf.predict %>%
  pull(units1519.error) %>%
  mean(., na.rm = T)

ME_1 = sf.predict %>%
  pull(units1519.error) %>%
  mean(., na.rm = T)

MAE_lm_fit <- lm(units.tot.15_19 ~ units1519.abserror, data=sf.predict)
summary(MAE_lm_fit)

ape_breaks = c(0,1,2.5,5)
pe_breaks = c(-5,-2.5,-1,0,1)

xmoney_breaks = c(0,1,2,4,6) * m
negxmoney_breaks = c(-2,-1,0,1,2,4) * m
ymoney_breaks = c(0,1,2,4,6) * m

percent_formatter = function(str) (str * 100) %>% round() %>% format(., big.mark = ",") %>% paste0(., "%")


grid.arrange(ncol=2,
  ggplot(data = sf.predict) +
    geom_point(aes(y = units.tot.15_19, x = units1519.error)) +
    # scale_x_continuous(breaks = negxmoney_breaks,
    #                    labels = money_format,
    #                    name = 'Price - Predicted Price') +
    # scale_y_continuous(breaks = ymoney_breaks, labels = money_format) +
    # geom_vline_lim(xintercept=ME_1, ylim=c(0,2*m), color='#90ee90', size=1.25, linetype = "solid") + 
    #     geom_vline_lim(xintercept=ME_2, 
    #                    ylim=c(2*m,8*m), 
    #                    color='#BF40BF', size=1.25, linetype = "solid") + 
    # #geom_vline(xintercept = test.ME, colour="#90ee90", size=1, linetype = "longdash") + 
    labs(title=title, subtitle = 
           glue("Error (ME = {sf.predict %>% pull(units1519.error) %>% mean(.) %>% round()})")) +
    plotTheme(),

  ggplot(data = sf.predict) +
    geom_point(aes(y = units.tot.15_19, x = units1519.pe)) +
    # scale_x_continuous(breaks = pe_breaks, name = 'Error / Price',
    #             labels = percent_formatter(pe_breaks)) + 
    # scale_y_continuous(breaks = ymoney_breaks, labels = money_format) + 
    # geom_vline(xintercept = test.MPE, colour="#FF9B00", size=1, linetype = "longdash") + 
    # 
    # geom_vline(xintercept = test.MPE, colour="#FF9B00", size=1, linetype = "longdash") + 
    # 
    labs(title='', subtitle = 
           glue("Percent Error (MPE = {percent_formatter(sf.predict %>% pull(units1519.pe) %>% mean(.) )})")) +
    plotTheme(),
  
  ggplot(data = sf.predict) +
    geom_point(aes(y = units.tot.15_19, x = units1519.abserror)) +
    # scale_x_continuous(breaks = xmoney_breaks,
    #                    labels = money_format,
    #                    name = 'Absolute Error') + 
    # scale_y_continuous(breaks = ymoney_breaks, labels = money_format) + 
    # geom_line(color='red',data = 
    #         data.frame(MAE_fit = predict(MAE_lm_fit, sf.test), 
    #                    hp=sf.test$units1519.abserror), aes(x=hp, y=MAE_fit)) + 
    # geom_vline(xintercept = test.MAE, colour="#50c878", size=1, linetype = "longdash") + 
    labs(subtitle = 
           glue("Absolute Error (MAE = {sf.predict %>% pull(units1519.abserror) %>% mean(.)  %>% round()})")) +
    plotTheme(),
  
  ggplot(data = sf.predict ) +
    geom_point(aes(y = units.tot.15_19, x = units1519.ape)) +
    # scale_x_continuous(breaks = ape_breaks, name = 'Absolute Percentage Error',
    #            labels = percent_formatter(ape_breaks)) + 
    # scale_y_continuous(breaks = ymoney_breaks, labels = money_format) + 
    # geom_vline(xintercept = test.MAPE, colour="#ff8c00", size=1, linetype = "longdash") + 
    labs(subtitle = 
           glue("Absolute Percent Error (MAPE = {sf.predict %>% pull(units1519.ape) %>% mean(.)  %>% round()})")) + 
    plotTheme()
)

5.0.2 Scatterplot

Plot predicted prices as a function of observed prices

Figure 6 highlights the trends seen in the Test Errors Scatterplot (Figure 5). There are predictions of negative added housing. Which is very strange. It suggests that the model is assuming some blocks should have less added housing units. It also appears that there was an over prediction for some of the same blocks that have no housing units.


fig_num = 6
title = glue('Figure {fig_num}: Predicted & Observed Sales Prices')

line_min = min(max(sf.predict$units.tot.15_19), max(sf.predict$units1519.predict))

ggplot() +
    geom_point(data = sf.predict, aes(x = units.tot.15_19, y = units1519.predict)) +
    geom_smooth(data = sf.predict, aes(x = units.tot.15_19, y = units1519.predict), method = lm, color='green') + 
    # geom_line(data=data.frame(
    #       x=c(0,line_min+1*m),
    #       y=c(0,line_min+1*m)),
    #       aes(x=x,y =y), linetype='dashed',
    #               size=1.25, color='orange') +
    # scale_x_continuous(breaks = c(0,1,1.5,2.5,5)*m,
    #                    #labels = money_format,
    #                    name = 'Observed Price') +
    # scale_y_continuous(breaks = c(0,1,1.5,2.5)*m, name = 'Predicted Price',) +
    labs(title=title,
      subtitle = "Final Predicted Housing Prices") +
    #xlim(min(sf.predict$price,0), max(sf.predict$price)) + 
    #ylim(min(sf.predict$units1519.predict,0), max(sf.predict$units1519.predict)) + 
    plotTheme()
## `geom_smooth()` using formula 'y ~ x'

5.0.3 Errors Map

The errors map helps locate where there is incorrect predictions. Based on the absolute percentage error, the area just above the tenderloin & downtown in the Northeastern area is the highest. This area of Chinatown & North Beach likely defies the argument that areas of high renters and/or lower-income residents are where development is happening. The Chinatown area has been resistant to development and has minimized the amount of displacement – typically seen by low-income, long-term residents of the Tenderloin & Mission. The North Beach area has higher amounts of higher-income, younger renters – but not that much development. Likely from the very steep hills.


map_num = 4

Var1_map = 
  ggplot() +
    geom_sf(data=sf.predict, aes(fill=units1519.error))+ 
  scale_fill_distiller(palette = "RdYlGn", direction = 1) +
  ggtitle("Error") + 
  mapTheme()

Var2_map = 
  ggplot() +
    geom_sf(data=sf.predict, aes(fill=units1519.ape))+ 
  scale_fill_distiller(palette = "Reds", direction = 1) +
  ggtitle("Abosolute Percentage Error") + 
  mapTheme()

title = glue('Map {map_num}: Home Value Predictions\nFinal Model on Full Dataset')
grid.arrange(
  Var1_map,
  Var2_map,
  #Var3_map,
  #Var4_map,
  ncol=2,
  top = grid::textGrob(title,gp=grid::gpar(fontsize=15))
)

5.1 Conclusion

The linear model was more of a success than I could have predicted (no pun intended). The model’s significant predictors shows signs that new housing is added in areas where there is a large amount of demographic and income changes.


6 Conclusion

6.1 Overall

The project – through the visual analysis of Sections 2 & 3, the spatial analysis of Section 4, and the Linear Results of Section 5 – shows evidence that new housing has been mainly added in block groups with: 1. increases in median income, 2. decreases in the percent of renters, 3. increases in white people, and 4. have higher amounts of area dedicated to residential land.

In the context of the housing shortage and affordability crisis, this is concerning as it suggests that: 1. more long-term, low-income residents are being displaced, and 2. housing is being added to cater to high-income, white renters & those looking to buy homes.

So new homes aren’t exactly relieving the shortage, but just jumbling up residents. There could be a case that new units, regardless of who they cater to, will open up spots elsewhere in the city. But this project did not focus enough on the ‘filtering effect’ or tracking where people are moving from and being displaced to.

6.2 Revisting the Gentrification Stages

Reconsidering the 3 different methods of measuring gentrification, I could argue that the areas of the Tenderloin and Mission would fulfill Freeman’s and Landis’ criteria of gentrification. This is largely due to their models being focused primarily on large jumps in median income. I don’t believe my project or model was organized in a way to answer all of the detailed criteria of the urban displacement model.

6.3 Improvements to the Project

The difficulty of this project is that it could have been taken in many directions. I wanted to understand if housing units in the last 7 years are fueling displacement. I could have just focused on displacement. I could have focus specifically on permits. So it was difficult narrowing down to a few questions

Regarding the scope of this project that I gave myself, I feel that I could have improved the linear model through improved feature engineering and selection. But a more accurate linear model still wouldn’t have made me 100% comfortable suggesting the new housing units are adding to displacement and not helping the shortage.

This subject matter is fairly complex and it will be very difficult to find the core problem of the housing crisis. Or suggest ways to improve it.


6.4 Bibliography

  1. Bogin, Alexander, William Doerner, and William Larson. 2019. “Local House Price Dynamics: New Indices and Stylized Facts.” Real Estate Economics 47 (2): 365–98. https://doi.org/10.1111/1540-6229.12233.
  2. Brown-Saracino, Japonica. 2017. “Explicating Divided Approaches to Gentrification and Growing Income Inequality.” Annual Review of Sociology 43 (1): 515–39. https://doi.org/10.1146/annurev-soc-060116-053427.
  3. Clay, Phillip L. 1979. Neighborhood Renewal: Middle-Class Resettlement and Incumbent Upgrading in American Neighborhoods. Lexington Books.
  4. Desmond, Matthew, and Kristin L. Perkins. 2016. “Housing and Household Instability.” Urban Affairs Review 52 (3): 421–36. https://doi.org/10.1177/1078087415589192.
  5. Dougherty, Conor. 2017. “The Great American Single-Family Home Problem.” The New York Times, December 1, 2017, sec. Business. https://www.nytimes.com/2017/12/01/business/economy/single-family-home.html.
  6. Egan, Ted. n.d. “The Economics of San Francisco Housing,” 22.
  7. Einstein, Katherine Levine. 2021. “The Privileged Few: How Exclusionary Zoning Amplifies the Advantaged and Blocks New Housing—and What We Can Do About It.” Urban Affairs Review 57 (1): 252–68. https://doi.org/10.1177/1078087419884644.
  8. Einstein, Katherine Levine, David M. Glick, and Maxwell Palmer. 2019. Neighborhood Defenders: Participatory Politics and America’s Housing Crisis. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108769495.
  9. Einstein, Katherine Levine, Maxwell Palmer, and David M. Glick. 2019. “Who Participates in Local Government? Evidence from Meeting Minutes.” Perspectives on Politics 17 (1): 28–46. https://doi.org/10.1017/S153759271800213X.
  10. Freeman, Lance. 2005. “Displacement or Succession?: Residential Mobility in Gentrifying Neighborhoods.” Urban Affairs Review 40 (4): 463–91. https://doi.org/10.1177/1078087404273341.
  11. Glaeser, Edward L., and Joseph Gyourko. 2002. “The Impact of Zoning on Housing Affordability.” Working Paper 8835. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w8835.
  12. Hanratty, Maria. 2017. “Do Local Economic Conditions Affect Homelessness? Impact of Area Housing Market Factors, Unemployment, and Poverty on Community Homeless Rates.” Housing Policy Debate 27 (4): 640–55. https://doi.org/10.1080/10511482.2017.1282885.
  13. Humphries, John Eric, Nicholas S. Mader, Daniel I. Tannenbaum, and Winnie L. van Dijk. 2019. “Does Eviction Cause Poverty? Quasi-Experimental Evidence from Cook County, IL.” Working Paper 26139. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w26139.
  14. Landis, John D. 2016. “Tracking and Explaining Neighborhood Socioeconomic Change in U.S. Metropolitan Areas Between 1990 and 2010.” Housing Policy Debate 26 (1): 2–52. https://doi.org/10.1080/10511482.2014.993677.
  15. McNee, Georgina, and Dorina Pojani. 2021. “NIMBYism as a Barrier to Housing and Social Mix in San Francisco.” Journal of Housing and the Built Environment, May. https://doi.org/10.1007/s10901-021-09857-6.
  16. O’Sullivan, Arthur. 2012. “Chapter 14: Why Is Housing Different?” In Urban Economics.
  17. Schaffner, Brian F., Jesse H. Rhodes, and Raymond J. La Raja. 2020. Hometown Inequality: Race, Class, and Representation in American Local Politics. Cambridge University Press.
  18. SF Rent Arbitration Board. 2022. “Eviction Notices Data.” DataSF. 2022. https://data.sfgov.org/Housing-and-Buildings/Eviction-Notices/5cei-gny5.
  19. Thomas, Tim, Karen Chapple, and Julia Greenberg. 2021. “SF Bay Area – Rent and Demographic Change Map.” 2021. https://www.urbandisplacement.org/maps/sf-bay-area-rent-and-demographic-change/.
  20. Thomas, Tim, Karen Chapple, and Julia Greenberg. 2021. “SF Bay Area – Gentrification and Displacement Map.” Accessed May 6, 2022. https://www.urbandisplacement.org/maps/sf-bay-area-gentrification-and-displacement/.
  21. Thomas, Tim, Anna Driscoll, Gabriela Picado Aguilar, Carson Hartman, Julia Greenberg, Alex Ramiller, Anna Cash, Miriam Zuk, and Karen Chapple. 2020. Urban-Displacement/Displacement-Typologies: Release 1.1 (version v1.1). Zenodo. https://doi.org/10.5281/ZENODO.4356684.
  22. Trounstine, Jessica. 2018. Segregation by Design: Local Politics and Inequality in American Cities. Cambridge University Press.
  23. US Census Bureau. 2020. “National, State, and County Housing Unit Totals: 2010-2019.” US Census Bureau. https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-housing-units.html.