I will be honest: this example is a little “backwards” because I am graphing every pitch faced by a batter of the course of a segmented season. However, the general coding remains the same: you would simply replace the batter’s name with the pitcher’s name you want to graph when you get your data.With that in mind, this post is going to provide a step-by-step overview on how to create your own launch angle and exit speed graphs based on individual players.
**Note: I am always under the impression that you already have a basic knowledge of using RStudio and Baseball Savant. If not, check out my beginner’s guide to RStudio here (coming soon!).
1. Gathering Data from Baseball Savant
I know, I know – this step could be performed by using the fantastic BaseballR package. However, I tend to move faster in this step by using Baseball Savant, downloading the data, and then importing it into RStudio. It might be a bit old school, I guess, but it works for me.
So, in this specific example, I am taking a look at the Pittsburgh Pirate’s Josh Bell. I am a big Pirates fan, unfortunately. And Bell is likely the next rising superstar that will be traded away for a no-name minor leaguer and a rotten grapefuit.
If you can’t tell, I’m one of those Pirates fans that harbor quite a bit of anger towards the Pirates’ ownership.
Anyways: in looking at Josh Bell’s information, I wanted to look at just the first 60 games of last season to mimic what this upcoming 2020 season is going to look like. A bit of a detour in thought, but the 2020 baseball season is going to be wildly unpredictable. With well under half the normal games, there is little time – or, perhaps, none at all – for the expected regression to take place. Which I why I think it is interested to do some research on last season just through the first sixty games.
So, using Baseball Savant, I grab every single pitch that Josh Bell hit between March 28 and April 6 of last season (the 60-game mark for the Pirates). After downloading the spreadsheet, I input it into RStudio as my dataset.
2. Initial Coding for Create Strike Zone and Name Pitches
##Drawing The Strike Zone x <- c(-.95,.95,.95,-.95,-.95) z <- c(1.6,1.6,3.5,3.5,1.6) #store in dataframe sz <- data.frame(x,z) ##Changing Pitch Names pitch_desc <- joshbell_hitting$pitch_type ##Changing Pitch Names pitch_desc[which(pitch_desc=='CH')] <- "Changeup" pitch_desc[which(pitch_desc=='CU')] <- "Curveball" pitch_desc[which(pitch_desc=='FC')] <- "Cutter" pitch_desc[which(pitch_desc=='FF')] <- "Four seam" pitch_desc[which(pitch_desc=='FS')] <- "Split Flinger" pitch_desc[which(pitch_desc=='FT')] <- "Two-Seam" pitch_desc[which(pitch_desc=='KC')] <- "Kuckle-Curve" pitch_desc[which(pitch_desc=='SI')] <- "Sinker" pitch_desc[which(pitch_desc=='SL')] <- "Slider"
Let’s quickly talk about what is happening here.
First, you are creating an ‘x’ variable with those specific restrictions, as well as doing so for the variable ‘z’. Afterwards, you are simply combing both into one data frame. It may not make sense know, but you will understand once the plot is created.
Next, we change the variable ‘pitch type’ that was included in the Baseball Savant data to ‘pitch_desc.’ After, as you can see in the above code, you are changing the shorthand description of the pitch as provided by Baseball Savant into the long-hand version. Doing so make the graph look a bit more professional.
3. Plotting the Data Using ggplot2
ggplot() + ##First plotting the strike zone that we created geom_path(data = sz, aes(x=x, y=z)) + coord_equal() + ##Now plotting the actual pitches geom_point(data = joshbell_hitting, aes(x = plate_x, y = plate_z, size = release_speed, color = pitch_desc)) + scale_size(range = c(-1.0,2.5))+ ##Using the color package 'Viridis' here scale_color_viridis(discrete = TRUE, option = "C") + labs(size = "Speed", color = "Pitch Type", title = "Josh Bell - Pitch Chart", subtitle = "March 28 - April 6, 2019") + ylab("Feet Above Homeplate") + xlab("Feet From Homeplate") + theme(plot.title=element_text(face="bold",hjust=-.015,vjust=0,colour="#3C3C3C",size=20), plot.subtitle=element_text(face="plain", hjust= -.015, vjust= .09, colour="#3C3C3C", size = 12)) + theme(axis.text.x=element_text(vjust = .5, size=11,colour="#535353",face="bold")) + theme(axis.text.y=element_text(size=11,colour="#535353",face="bold")) + theme(axis.title.y=element_text(size=11,colour="#535353",face="bold",vjust=1.5)) + theme(axis.title.x=element_text(size=11,colour="#535353",face="bold",vjust=0)) + theme(panel.grid.major.y = element_line(color = "#bad2d4", size = .5)) + theme(panel.grid.major.x = element_line(color = "#bdd2d4", size = .5)) + theme(panel.background = element_rect(fill = "white"))
From a coding standpoint, this is pretty straight forward stuff.
Once you ‘clean’ the data just a bit for presentation purposes, everything you need is already there. No need for complicated data wrangling or anything of that sort.
As you can see in the above ggplot coding, we are simply the ‘plate_x’ and ‘plate_z’ data provided by Baseball Savant and then mapping it against by size (release_speed) and color (pitch_desc).
The end result should look like this:
As you can see, we now have a graph that depicts every single pitch that Josh Bell faced in the first 60-game of last season. To make it even better, we could probably place the pitch speed into ranges (75-80, 81-85, etc.) simply to add a little more depth to that graph.
That said: the next step in this graph would be to change it to a spray chart from the batter’s perspective to see which of these pitches he hit and where exactly they went.
And, as I mentioned, if you wanted to do this from the pitcher’s perspective, simply download a pitcher’s data from Baseball Savant and do the exact coding as above. For example, here is Anthony DeSclafani’s pitching chart from the same period of above:
Obviously you can make more astute observations from this simply because it is a pitcher, as opposed to the above Josh Bell one. For example, DeSclafani clearly had significant control issues of his four-seam fastball in the early parts of last season. If one wanted, a month-by-month plot could be created to see if that issue was ever corrected (just off the top of my head).