We can dynamically create a string of rows and then generate a dataframe.
However it would be considered as a single line and would throw an error.
We need to split lines based on the delimiter. This can be done by writing a split function as shown below
We need to split lines based on the delimiter. This can be done by writing a split function as shown below
CREATE DATAFRAME
from pyspark.sql.functions import lit
# create rdd for new id data_string ="" for rw in baseline_row.collect(): for i in range(24): hour="h" + str(i+1) hour_value= str(rw[hour]) data = 'Row('+ rw.id +', "unique_id"),' data_string = data_string + data #dynamically generated data for hours print(hourly_data) rdds=spark_session.sparkContext.parallelize([data_string]) rdds.map(split_the_line).toDF().show()
SPLIT FUNCTION
# Function to split the line def split_the_line(x): return x.split(',')
No comments:
Post a Comment