I was recently working on a project that generated a local GeoJSON file (exported from QGIS) and I wanted to import this file into Wherobots Cloud and into SedonaDB. This seems like a common enough pattern that I thought it would be helpful to share the steps I used here in case it helps others.
First, upload your GeoJSON file. Wherobots Cloud includes free file storage and management based on AWS S3. These files are private to your user organization and accessible within the Wherobots Notebook environment via an S3 URL. I uploaded my file using the “File” tab in Wherobots Cloud.
Clicking on the file icon in the far right column next to the file name (in this case idaho_treestands.geojson
) will copy the S3 URL to your clipbloard.
Then in my Wherobots Notebook I saved that S3 URL as a variable:
S3_URL_TREES = "s3://<PATH TO YOUR GEOJSON FILE HERE>"
To import GeoJSON into SedonaDB we can use Spark’s built-in JSON import functionality if we first define the schema used by GeoJSON.
schema = "type string, name string, crs string, features array<struct<type string, geometry string, properties map<string, string>>>"
Now, using Spark’s JSON import functionality (in this case our file is multiline JSON you might want to adjust that option depending on the format of your JSON):
tree_df = sedona.read.option('multiline', True).json(S3_URL_TREES, schema=schema).selectExpr("explode(features) as features").select("features.*").withColumn("geometry", expr("ST_GeomFromGeoJson(geometry)")).withColumn("fid", expr("properties['fid']")).drop("properties").drop("type")
tree_df.createOrReplaceTempView("trees")
tree_df.show(5)
+--------------------+---+
| geometry|fid|
+--------------------+---+
|POLYGON ((-116.52...| 1|
|POLYGON ((-116.54...| 2|
|POLYGON ((-116.12...| 3|
|POLYGON ((-116.16...| 4|
|POLYGON ((-116.09...| 5|
+--------------------+---+
only showing top 5 rows
In my case the only property I had (other than the geometry, which is stored separately from other properties) was fid
, but if have other properties to bring through you can include them by chaining multiple .withColumn("<PROPERTY_HERE>", expr("properties['<PROPERTY_HERE>']))
There is more information on this in the Wherobots documentation here and you can find the full code for my example on GitHub here.