Generate Fake Data using PySpark in 1 min
Автор: GeekCoders
Загружено: 2024-10-21
Просмотров: 689
Описание:
Know more about the pyspark course: https://www.geekcoders.co.in/courses/...
import farsante
from mimesis import Person,Address,Datetime
p=Person('en')
ad=Address('en')
dt=Datetime()
df=farsante.pyspark_df([p.first_name,p.last_name,p.sex,p.age,ad.country,ad.country_code,ad.address,ad.city,ad.state,dt.year],100)
display(df)
You can use below code to generate the data using faker
from pyspark.sql import SparkSession
from pyspark.sql.functions import lit
from faker import Faker
import random
Initialize Faker and PySpark
fake = Faker()
spark = SparkSession.builder.appName("FakeData").getOrCreate()
Function to generate fake data
def generate_fake_data(num_records):
data = []
for _ in range(num_records):
data.append((
fake.name(),
fake.email(),
fake.address(),
fake.phone_number(),
fake.date_of_birth(minimum_age=18, maximum_age=90).strftime("%Y-%m-%d"),
random.randint(1000, 10000) # random salary
))
return data
Number of fake records you want
num_records = 1000
Generate the fake data
fake_data = generate_fake_data(num_records)
Create PySpark DataFrame
columns = ["Name", "Email", "Address", "Phone", "Date_of_Birth", "Salary"]
df = spark.createDataFrame(fake_data, columns)
Show some rows from the DataFrame
df.show(10, truncate=False)
Stop Spark session (optional)
spark.stop()
#pyspark #spark #bigdata #databricks
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: