Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
Initial Readme
  • Loading branch information
zvijayakumar authored Feb 19, 2023
1 parent 28357c5 commit 779ca07
Showing 1 changed file with 66 additions and 0 deletions.
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Spark Schema Generator from PostgreSQL Table Schema
This repository contains a Python script that generates a Spark StructType schema from a PostgreSQL table schema.

Requirements
To use the script, you will need:

Python 3.x
pyspark module
psycopg2 module
Access to a PostgreSQL database with the schema you want to generate a Spark schema for
Usage
Clone the repository to your local machine:

git clone https://github.com/yourusername/spark-postgres-schema-generator.git
Install the required modules:

pip install pyspark psycopg2
Open the generate_spark_schema.py file in your preferred text editor.

Update the following variables to match your PostgreSQL database connection details:


host = "localhost"
database = "mydb"
user = "myuser"
password = "mypassword"
Update the schema_name variable to match the name of the PostgreSQL schema you want to generate a Spark schema for:


python generate_spark_schema.py
The script will print the generated Spark schema to the console.

Supported PostgreSQL Data Types
The script supports the following PostgreSQL data types:

ARRAY
bigint
boolean
bytea
character
character varying
date
double precision
integer
json
name
numeric
oid
smallint
text
timestamp with time zone
timestamp without time zone


Notes
The script assumes that any PostgreSQL column with a NOT NULL constraint is required in the Spark schema, and any column without a NOT NULL constraint is nullable in the Spark schema.
The script assumes that any PostgreSQL array column contains string elements.
The script assumes that any PostgreSQL numeric column has a specified precision and scale. If your database uses the default precision and scale for numeric columns, you may need to modify the script to handle this case.
The script assumes that the PostgreSQL schema name you provide has access to the tables you want to generate a Spark schema for. If you need to generate a schema for tables in a different schema, you will need to update the script accordingly.

License
This project is licensed under the MIT License. See the LICENSE file for details.




0 comments on commit 779ca07

Please sign in to comment.