ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

Python Tutorial : The important of EDA: Anscombe's quartet

Автор: DataCamp

Загружено: 2020-04-11

Просмотров: 1206

Описание: Want to learn more? Take the full course at https://learn.datacamp.com/courses/st... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---

In 1973, statistician Francis Anscombe published a paper that contained four fictitious x-y data sets, plotted here. He uses these data sets to make an important point. That point becomes clear if we blindly go about doing parameter estimation on these data sets.

First, let's look at the average x-values of the four data sets. They are all the same. How about the average y-values? Again, all the same. And what if we do a linear regression on each of the data sets? They all have the same line!

Surely some of the fits are less optimal than others. Let's look at the sum of the squares of the residuals. Oh my, they are all basically the same as well.

Of course, Anscombe constructed the data sets so that this would happen. The point he was making is very important. You already have some powerful tools for statistical inference. You can compute summary statistics and optimal parameters, including linear regression parameters, and by the end of the course, you will able to construct confidence intervals with quantify uncertainty about the parameter estimates. These are crucial skills for any data analysis, no doubt. But look before you leap!

This is a powerful reminder to do some graphic exploratory data analysis before you start computing and making judgments about your data. For example, this data set might be well modeled with a line, and the regression parameters will be meaningful. The same is true of this data set, but the outlier throws off the slope and intercept. After doing EDA, you should look into what is causing that outlier.

This data set might also have a linear relationship between x and y, but from the plot, you can conclude that you should try to acquire more data for intermediate x values to make sure that it does. And this data set is definitely not linear, and you need to choose another model.

Explore your data first.

I'll let you prove to yourself that these data sets give the same regression parameters. It will be good practice, and seeing is believing!

#DataCamp #PythonTutorial ##StatisticalThinkinginPython #StatisticalThinkinginPythonPart 2

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
Python Tutorial : The important of EDA: Anscombe's quartet

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]