How to clean and normalize unstructured dates in BigQuery
Автор: Data Religion
Загружено: 2022-12-06
Просмотров: 399
Описание:
Learn how to normalize multiple date formats on a single column in Google BigQuery. This ETL tutorial video also shows you how to write regex or regular expressions for date extraction.
Queries used in the video:
(1) Table Creation Statement Containing Unstructured Dates:
Replace the dataset name if required in your query
=======
CREATE OR REPLACE TABLE `etl.cakeshop`
AS
SELECT 'sugar' as item, '2022-09-03 18:34:23' as deliveryTime
UNION ALL SELECT 'flour', '2022-09-02 14:24:24'
UNION ALL SELECT 'tea', '08/03/2022 14:20:21'
UNION ALL SELECT 'butter', '8/03/2022 14:15:20'
UNION ALL SELECT 'cocoa', '7/03/2022 11:14:20'
UNION ALL SELECT 'essence', '2022-09-01 10:11:09'
UNION ALL SELECT 'cherries', '2022-09-01 08:11:09'
UNION ALL SELECT 'tutti frutti', '9/01/2022 11:14:20'
========
(2) Regular Expression for Matching and Extracting Unstructured Dates and Time
=======
SELECT *,
CASE
WHEN REGEXP_CONTAINS(deliveryTime, r'[0-9]{4}\-[0-9]{2}\-[0-9]{2}') THEN PARSE_DATE('%F',SUBSTR(deliveryTime,1,10))
WHEN REGEXP_CONTAINS(deliveryTime, r'[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}[\s]?') THEN PARSE_DATE("%m/%d/%Y", REGEXP_EXTRACT(deliveryTime, r'[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}[\s]?') )
END as persed_date,
PARSE_TIME("%T", REGEXP_EXTRACT(deliveryTIme, r'[\s][\d]+\:[\d]+\:[\d]+$')) AS persed_time
==========
Other Resources
Regular Expressions in BigQuery:
https://cloud.google.com/bigquery/docs/ref...
Date Functions in BigQuery:
https://cloud.google.com/bigquery/docs/ref...
Contact me: [email protected] & [email protected]
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: