Linear Regression Prediction: Exploring 2012 Crime Rates in NYC

In a previous article on linear regression, we went through an example of using it to predict an outcome based on input and output variables. In this article, we’re going to use a linear regression prediction to do the same thing but for a more complicated data set by using a dataset detailing the crime… Continue reading Linear Regression Prediction: Exploring 2012 Crime Rates in NYC

NFL Play by Play Data: Full Access in 3 Easy Steps

Getting NFL play by play data is a key part of doing sports analytics. Thankfully, getting access to NFL play by play data is now easier than ever. Please note, I have a complete guide titled A Beginner’s Guide to NFL Analytics: Getting Started with nflfastR and RStudio. It will help you figure out how… Continue reading NFL Play by Play Data: Full Access in 3 Easy Steps

How To Do Multiple Regression In R: A Step-by-Step Tutorial

How to do multiple regression in R is a pretty commonly asked question (especially by those students attempting to do their first statistical analysis!). First, let’s discuss what a multiple linear regression is. Multiple linear regression is a straightforward generalization model. It allows you to model your continuous response variable in terms of more than… Continue reading How To Do Multiple Regression In R: A Step-by-Step Tutorial

Web Scraping with rvest: Exploring Sport Industry Jobs

Web scraping with rvest is easy and, surprisingly, comes in handy in situations that you may not have thought of. For example, one of the unique things about academics is the constant need to stay “ahead of the curve,” meaning being nimble enough as a program to shift curriculum around to provide students training and… Continue reading Web Scraping with rvest: Exploring Sport Industry Jobs

An R Tutorial for Beginners: First Steps

This first chapter in my R Tutorial for Beginners takes the first look at the R programming language, and the associated RStudio software. To quickly summarize for absolute beginners, R is an incredibly flexible and easy-to-learn programming environment that is now widely used in both professional and academic settings. In fact, R was the first… Continue reading An R Tutorial for Beginners: First Steps

Difference in Difference in R: A Complete & Easy Tutorial

Conducting a difference in difference in R allows researchers to gain insight into the impact of a policy, or other outside factors, by taking into consideration two things: how a group mean changes before and after the policy or other outside factors was implemented. To that end, this is considered the treatment group. compare this change… Continue reading Difference in Difference in R: A Complete & Easy Tutorial

How To Clear The Environment in R: Keep RStudio from Running Slow

Knowing how to clear the environment in R is one of the easiest ways to overcome an RStudio installation that is running entirely too slow. To that end, RStudio can slow to a crawl due to a multitude of reasons ranging from a large amount of information being stored in memory, settings being set in… Continue reading How To Clear The Environment in R: Keep RStudio from Running Slow

A Beginner’s Guide to NFL Analytics: Getting Started with nflfastR and RStudio

Thanks to the work of a handful of people (@mrcaseb, @benbbaldwin, @_TanHo, @LeeSharpeNFL, and @thomas_mock … to name a few), getting started with advanced analytics using NFL data is now easier than ever. Without getting too far into the weeds of the history behind all this, the above-mentioned people are responsible for the creation of… Continue reading A Beginner’s Guide to NFL Analytics: Getting Started with nflfastR and RStudio

Computing Player Performance Percentiles Using Scraped Data

There is no doubt about it: we are currently in the golden age of big data when it comes to the NFL, MLB, and many other leagues. In this case, the nflfastR project (which is the “child” of Ron Yurko’s nflscrapR) allows for fast and easy access to deeply detailed and rich statistics dating back… Continue reading Computing Player Performance Percentiles Using Scraped Data

A Better Way To Work With Zillow ZTRAX Data: A Guide To Wrangling the Data in R

Image from Pixabay. Editing by author.

For researchers and/or academics that have any interest in working with housing data, Zillow’s ZTRAX database is a must. The ZTRAX database, short for Zillow Transaction and Assessment Dataset, is unquestionably the largest real estate database that has ever been made available – free of charge – to qualified academic, nonprofit, and governmental researchers. Previously… Continue reading A Better Way To Work With Zillow ZTRAX Data: A Guide To Wrangling the Data in R