R Tutorial: Data Visualization with lattice in R
Автор: DataCamp
Загружено: 2020-02-27
Просмотров: 1867
Описание:
Want to learn more? Take the full course at https://learn.datacamp.com/courses/da... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Welcome to the course!
Data visualization is an essential tool for any data analyst.
It's important for exploratory data analysis, where you try to find and understand patterns in your data as quickly as possible.
It's also important when presenting or reporting your results, but in that case, clarity and ease of customization are more important than speed.
There are many R packages for data visualization, but most of them are based on one of three graphics frameworks.
The first one is base R graphics, which has been available in R from the beginning and provides a rich collection of tools. However, it's not very flexible.
Lattice graphics is in some ways a successor to base graphics and tries to address many of its shortcomings.
The third and newest framework is based on the ggplot2 package.
All these frameworks are complete by themselves, but they are also important because of the large eco-system of packages built around them.
In this course, you'll learn to use lattice graphics, for both exploration and presentation.
In the first couple of chapters, you'll use a dataset that records death-rates due to cancer, at the US county level, separately for males and females.
The main variable of interest is the annual rate of death due to cancer. The state is also available as a covariate.
Your goal will be to explore how the distribution of death-rate varies with gender and location.
Here is a familiar plot: the histogram.
Histograms show the distribution of a continuous variable, which in this case is the death-rate among males.
This histogram is not particularly exciting; later we'll improve on it.
Here's another familiar visualization: a scatter plot.
This one shows the death-rates among females versus males, where each circle represents the rates corresponding to one county.
The plot suggests that the rates are correlated and that there are some possible outliers, but otherwise, it's not remarkable.
These two plots are produced by the lattice functions histogram() and xyplot().
They are similar to base R graphics functions hist() and plot() respectively, but one important difference is in how the variables in the plot are specified.
The lattice functions take a formula and a data frame as inputs.
The left-hand side of the formula names variables on the y axis, and the right-hand side names variables on the x axis.
A histogram does not need the y axis to be specified, so the left-hand side remains blank in that case.
The main advantage of the formula interface is that it lets you compactly describe the plot without making all the variables in the dataset visible in the workspace.
This is similar to modeling functions such as lm(), like in this call to fit a linear regression model with rate.female as response and rate.male as a predictor.
The plots you just saw are very minimal. Such minimal plots are usually sufficient for exploration, but for presentation purposes, plots need to be more polished.
For example, here's a more elaborate version of the previous scatter plot. It has more descriptive labels, a reference grid, and a reference line along the "y equals x" diagonal.
This makes one point very clear that was hard to see in the earlier scatter plot.
In absolute terms, the death-rate in females is substantially lower than in males, for almost all counties.
Throughout the course, you'll learn how to customize your lattice plots in different ways.
Now you go on to draw some histograms and scatter plots.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: