Wessel Sandtke - Don’t judge a book by its cover: Using LLM created datasets to train models...

Автор: PyData

Загружено: 2023-11-22

Просмотров: 509

Описание: Don’t judge a book by its cover: Using LLM created datasets to train models that detect literary features

Existing book recommendation systems like Goodreads are based on correlating the reading habits of people. But what if you want a humorous book? Or a book that is set in 19th century Paris? Or a thriller, but without violence?
We build book recommendation systems for Dutch libraries based on more than a dozen features from historical setting, to writing style, to main character characteristics. This allows us to tailor each recommendation to individual readers.

The recent developments in LLMs are an interesting area for us to explore to improve our recommendations. However, running LLMs in production is unfortunately not always feasible. The associated costs may be too high, and running code from third parties in your daily pipeline may be undesirable. And then there’s data privacy - or, in our case, intellectual copyright - to be considered as well.

So how can you reap the benefits of an LLM, without exposing yourself or your company to some of these major downsides?

We utilized LLMs to generate custom, tailor-made datasets for our literary feature detection models to train on. This allowed us to benefit from the high performance of large language models, without continued reliance on external parties such as OpenAI or Google.

While you may think LLMs are not as effective for languages other than English, we’ve seen major improvements in several of our models.

In this talk, we’ll highlight:
A note on recommenders: Why does Goodreads recommender not work for me, while Spotify’s Discover Weekly is so good?
Different methods of getting data from books
Iterative process of creating a dataset using an LLM and retraining our models
Some notes on intellectual property and evaluation of models.

Bio:
Wessel Sandtke
Typewriter repairman turned Machine Learning Engineer, now working for Bookarang, a Dutch startup working with Dutch libraries to improve the recommendations for its members.
Wrote several picture books, but is not allowed to boost those in the recommendation system.

===

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Wessel Sandtke - Don’t judge a book by its cover: Using LLM created datasets to train models...

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Emeli Dral - Mind the language: how to monitor NLP and LLM in production | PyData Amsterdam 2023

Emeli Dral - Mind the language: how to monitor NLP and LLM in production | PyData Amsterdam 2023

PyData Boston - Traditional AI and LLMs for Automation in Healthcare (Lily Xu)

PyData Boston - Traditional AI and LLMs for Automation in Healthcare (Lily Xu)

PyData Boston - Beyond Embedding RAG (Griffin Bishop)

PyData Boston - Beyond Embedding RAG (Griffin Bishop)

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

FDAP Stack: High-Performance Data Architecture based on Apache Arrow

FDAP Stack: High-Performance Data Architecture based on Apache Arrow

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

Обзор Claude AI: Как он заменил мне Gemini, NotebookLM и Antigravity.

Обзор Claude AI: Как он заменил мне Gemini, NotebookLM и Antigravity.

Обучить собственную модель искусственного интеллекта не так сложно, как вы (вероятно) думаете

Обучить собственную модель искусственного интеллекта не так сложно, как вы (вероятно) думаете

Илон Маск про орбитальные дата‑центры и будущее ИИ

Илон Маск про орбитальные дата‑центры и будущее ИИ

ОРЕШКИН: "Началось! Могут быть перемены". Что случилось с Ремесло, что в Кремле, ГОГОЛЬ, ЧТО ДАЛЬШЕ

Новый ChatGPT: от новичка до PRO за полчаса. Большой бесплатный курс

Новый ChatGPT: от новичка до PRO за полчаса. Большой бесплатный курс

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Почему падает IQ и что такое G | Владимир Алипов

Почему падает IQ и что такое G | Владимир Алипов

Ричард Фейнман: скорость света — это не просто скорость (и это меняет всё)

Ричард Фейнман: скорость света — это не просто скорость (и это меняет всё)

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

«Хорошо, но мне нужна Llama 3 для моего конкретного случая использования» — вот как

«Хорошо, но мне нужна Llama 3 для моего конкретного случая использования» — вот как

John Clapham - Get where you want to go - a forward focussed approach to coaching

John Clapham - Get where you want to go - a forward focussed approach to coaching

Laura Summers - Ok, Doomer | PyData Amsterdam 2023

Laura Summers - Ok, Doomer | PyData Amsterdam 2023

Я попробовал все нейросети для видео! Какую выбрать? МОЙ ТОП

Я попробовал все нейросети для видео! Какую выбрать? МОЙ ТОП

Струны до ужасны!

Струны до ужасны!