Real Data Science USA Meetup, May 26, 2022 - Szilard Pafka: Best Algo for Tabular/Business Data?
Автор: Real Data Science USA (formerly DataScience.LA)
Загружено: 2022-05-26
Просмотров: 305
Описание:
Best Algorithm for Tabular/Business Data? Sorry, It’s Not Deep Learning…
Szilard Pafka, PhD
Chief Scientist, Epoch
With all the hype about deep learning and "AI", it is not well (enough) publicized that for structured/tabular data widely encountered in business applications it is actually another machine learning algorithm, the gradient boosting machine/gradient boosted decision trees (GBM/GBDT) that most often achieves the highest accuracy in supervised learning/prediction tasks. In this talk we'll provide plenty of evidence about the vast superiority of GBMs for tabular/business data over deep learning including deep learning methods “specialized” for tabular data such as TabNet, TabTransformer or SAINT. Next, we will present some of the major open source GBM implementations such as xgboost, h2o, lightgbm and catboost (all of them available from R and Python) and we will compare their main performance characteristics: training speed, memory footprint, scaling to multiple CPU cores, GPU implementations etc. While deep learning is certainly the best algorithm available for computer vision (and it has also shown some success in a few other rather specialized domains), in most business applications, where the data is most often of a tabular structure, gradient boosted decision trees are vastly superior to deep learning neural networks and should definitely be the algorithm of choice.
Bio:
Szilard studied Physics in the 90s and obtained a PhD by using statistical methods to analyze the risk of financial portfolios. He worked in finance, then in 2006 he moved to become the Chief Scientist of a tech company in Santa Monica, California doing everything data (analysis, modeling, data visualization, machine learning, data infrastructure etc). He was the founder/organizer of several meetups in the Los Angeles area (R, data science etc) and the data science community website datascience.la for more than a decade until he relocated to Texas in 2021. He is the author of a well-known machine learning benchmark on github (1000+ stars), a frequent speaker at conferences (keynote/invited at KDD, R-finance, Crunch, eRum and contributed at useR!, PAW, EARL, H2O World, Data Science Pop-up, Dataworks Summit etc.), and he has developed and taught graduate data science and machine learning courses as a visiting professor at two universities (UCLA in California and CEU in Europe).
LinkedIn: / szilard
Twitter: / szilardpafka
Github: https://github.com/szilard/
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: