R tutorial: Subsets and histograms

Автор: DataCamp

Загружено: 2016-11-10

Просмотров: 9333

Описание: Learn more about exploratory data analysis with baseball data: https://www.datacamp.com/courses/expl...

Now that you’ve prepared the dates appropriately, it’s time to start exploring your data.

You’ll begin by exploring the “start_speed” variable.

This variable indicates the velocity of each pitch thrown as it leaves the pitcher’s hand.

It’s important to note that the velocity measurements are in miles per hour, and the variable is entered as a numeric scale variable in R.

You’ll begin by using a histogram to visually explore the velocity of Greinke’s pitches.

In later exercises, you’ll describe the data numerically.

A histogram is a basic visualization tool for exploring the characteristics of your data.

Using all of the “start_speed” data, it’s easy to plot a histogram in R with the code here and get a very basic looking plot.

You’ll improve on the look of this plot in the exercises.

You can also indicate where the overall average start speed is on your histogram using the abline() function.

In this case, you’ll want to tell R that to draw a vertical line using the “V is equal to” parameter.

We want to make V equal to the mean start speed in the greinke data set.

Let’s also color the line red so it’s easy to see on our histogram.

Something else to notice about this figure is that it can be useful in identifying multi-modal distributions.

This could indicate some separation in velocity related to the type of pitch thrown.

This is easy to see here, where it looks like Greinke has a higher velocity distribution for fastballs, and a separate, lower velocity distribution for off-speed pitches.You can identify pitch type in the data with the pitch type variable, and make a separate histogram of each pitch type.

Here, let’s just create a histogram for sliders, represented by the “SL” code in the pitch type variable.

First, we’ll use the ifelse() function to make a new variable called “slider.”

The ifelse() function simply tells R that if the pitch type variable is equal to “SL”, then we want our new variable to be equal to one.

Otherwise, we make the variable equal to zero.

Notice that the ones in the new variable line up perfectly with the “SL” code in the pitch type variable.

You could also make a variable called “not slider.”

In this case, you would tell R that we want this variable equal to one if pitch type DOES NOT equal slider, and zero otherwise.

You can see the desired results here.

Any pitch type that is not a slider is equal to one in the “not slider” variable.

And any pitch type that is a slider is equal to zero.

Now that we’ve made a new variable to indicate a pitch was a slider, we can use this to easily subset our data.

The subset() function is an easy way to do this.

Naming the new data set “greinke_sl”, we tell R to keep any data where the “slider” variable is equal to one.

Notice here that our new data includes only sliders.

Further, note that within the subset() function, you already denote what data is being subset, and therefore when you give R the condition for the sub-setting, you do not have to use the data name and the dollar sign to choose your vector.

Granted, the original ifelse() was not necessary, as we could have also subset by the “pitch_type” variable in the first place, and ended up with the same result.

This makes subset() pretty convenient when we want to work with specific portions of our original data.

Finally, when making a histogram of just sliders, we can see that the distribution of a single pitch type is much closer to a normal distribution than what we saw with all pitch types.

Throughout the next few exercises, you’ll be performing similar operations to examine Greinke’s fastball velocity, and compare July to other months of the year.

Now start exploring your data.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

R tutorial: Subsets and histograms

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

R tutorial: Exploratory Data Analysis with Baseball Data

R tutorial: Exploratory Data Analysis with Baseball Data

Draw ggplot2 Plot of Data Frame Subset in R | Graphic of Particular Rows | subset() & %in% Operator

Draw ggplot2 Plot of Data Frame Subset in R | Graphic of Particular Rows | subset() & %in% Operator

Стандартное отклонение (простое объяснение)

Стандартное отклонение (простое объяснение)

ggplot for plots and graphs. An introduction to data visualization using R programming

ggplot for plots and graphs. An introduction to data visualization using R programming

Выучите R за 39 минут

Выучите R за 39 минут

Польша больно ударила по Лукашенко / Обращение к военным / Введён жёсткий запрет / BYстро.NEWS

Польша больно ударила по Лукашенко / Обращение к военным / Введён жёсткий запрет / BYстро.NEWS

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Calculate the P-Value in Statistics - Formula to Find the P-Value in Hypothesis Testing

Calculate the P-Value in Statistics - Formula to Find the P-Value in Hypothesis Testing

Задача из вступительных Стэнфорда

Задача из вступительных Стэнфорда

Why the Radius Is NOT 21 – Quarter Circle Geometry Puzzle

Why the Radius Is NOT 21 – Quarter Circle Geometry Puzzle

Create a subset data using R; subset() in R; filter function from dplyr

Create a subset data using R; subset() in R; filter function from dplyr

Введение в статистику и анализ данных

Введение в статистику и анализ данных

REAL ODPADA Z 2-LIGOWCEM! SENSACJA, ABSURD, NIEMOŻLIWE! ALBACETE LEPSZE, CO ZA FALSTART ARBELOI

REAL ODPADA Z 2-LIGOWCEM! SENSACJA, ABSURD, NIEMOŻLIWE! ALBACETE LEPSZE, CO ZA FALSTART ARBELOI

How to Create a Histogram in R for Beginners!

How to Create a Histogram in R for Beginners!

Исследовательский анализ данных с помощью Pandas Python

Исследовательский анализ данных с помощью Pandas Python

R tutorial - Learn How to Subset, Extend & Sort Data Frames in R

R tutorial - Learn How to Subset, Extend & Sort Data Frames in R

Как запоминать всё, как японские студенты (и учиться меньше)

Как запоминать всё, как японские студенты (и учиться меньше)

Все, что вам нужно знать о теории управления

Все, что вам нужно знать о теории управления

Моделирование Монте-Карло

Моделирование Монте-Карло

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке