Mastering Power Query: Remove Consecutive Duplicates Only in Your Dataset
Автор: vlogize
Загружено: 2025-04-02
Просмотров: 2
Описание:
Learn how to efficiently remove consecutive duplicates while retaining necessary duplicates in Power Query with a detailed guide.
---
This video is based on the question https://stackoverflow.com/q/73649206/ asked by the user 'Alan Treanor' ( https://stackoverflow.com/u/4021883/ ) and on the answer https://stackoverflow.com/a/73650335/ provided by the user 'Marcus' ( https://stackoverflow.com/u/16528000/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.
Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Power Query - Remove Consecutive Duplicates Only
Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.
If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Mastering Power Query: Remove Consecutive Duplicates Only in Your Dataset
Managing datasets can often feel like a daunting task, especially when it comes to cleaning up entries. A common issue arises when you have consecutive duplicates in your data that need to be removed without sacrificing genuine duplicates that appear between varying entries. In this guide, we will explore a solution using Power Query that will help you streamline your data for analysis.
The Problem
Imagine you have a dataset of airport location codes such as London (LHR), Paris (CDG), and Rome (FCO). However, your dataset may have consecutive duplicates like this:
LHR, LHR, LHR, CDG
On the flip side, you may also have genuine duplicates where entries must be retained like:
LHR, Paris, LHR
To visualize, consider these example routes:
For LHR, LHR, CDG, FCO, FCO, you want to convert this to LHR-CDG-FCO.
For LHR, LHR, CDG, CDG, CDG, LHR, the desired output is LHR-CDG-LHR.
Your goal is to find a way to teach Power Query to distinguish between these consecutive duplicates and genuine duplicates while generating a clean, usable output.
The Solution
Power Query provides a powerful function that can help you achieve this through a combination of commands. Follow this structured approach to create a custom column that will clean up your data as desired.
Step-by-Step Guide
Use Text.SplitAny to Split Data:
First, you need to split the Route column entries based on the delimiters (commas, for example). This can be done using the Text.SplitAny function.
Accumulate Unique Values:
With the help of List.Accumulate, you can loop through your split list and build a new list that only includes non-duplicate consecutive values.
Combining the Results:
Finally, you'll want to combine your distinct list of entries back into a single string using Text.Combine, separating them with a hyphen (-).
Formula for Implementation
You can implement the aforementioned steps by using this Power Query formula in a New Custom Column:
[[See Video to Reveal this Text or Code Snippet]]
Breakdown of the Functionality
Text.SplitAny: Splits your Route text into a list based on specified separators.
List.Accumulate: Iterates through your list, maintaining a state of what's been added.
It checks whether the current entry differs from the last entry in the accumulating list.
If identical, it ignores the entry. If not, it adds the new entry to the accumulating state with List.Combine.
Text.Combine: Finally, it rejoins your distinct entries into a single string with the chosen delimiter.
Conclusion
By utilizing the combination of List.Accumulate and Text.Combine in Power Query, you can effectively manage sequences of duplicates in your datasets. This functionality proves to be an invaluable tool for anyone working with data in Power BI and ensures that your outputs remain clean and true to the original intent of your data.
Give this formula a try on your dataset, and soon enough, you'll be able to convert messy routes into structured and neat strings, ready for analysis and presentation.
If you have any questions or need further clarification about Power Query functions, feel free to leave a comment below!
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: