
Predicting the Impact of a New Stadium Using a k-means Clustering Unsupervised Algorithm
** Scheduled for presentation at the 2022 ASMA Conference in indianapolis, indiana
**currently preparing manuscript for submission to the journal of sport management
This paper showcases an applied predictive model that uses data from those neighborhoods previously impacted by the construction of stadiums to examine the potential impact on neighborhoods with similar attributes facing looming stadium construction – in this case, specifically, Arlington Heights. The creation of the model is a two-step process. First, I employ the use of a k-means clustering unsupervised machine learning algorithm and the ‘tidycensus’ package in RStudio to produce geodemographic classification based on census tracts. This results in coupling those census tracts with similar demographic characteristics to those tracts in Arlington Heights. Second, the model uses a standard difference-in-differences approach to suggest what may occur in Arlington Heights based on the recorded impact on similar neighborhoods by professional stadiums. Using data from Zillow’s proprietary ZTRAX database, along with data from the ‘tidycensus’ package, the model put forth in this research is able to predict the rate of growth of both home prices and average rent costs in the areas surrounding the new stadium, as well as the impact on racial demographics by constructing a multi-group segregation index and a diversity gradient. The implications of this proposed model for practitioners in the sport industry is significant, as being able to forecast the impact of a professional stadium on neighborhoods and/or communities will greatly assist in remedying, or easing, the impact of gentrification that often occurs in such scenarios