Skip to content

DataFrames conversion

Leonid Poliakov edited this page Jul 14, 2016 · 11 revisions

Reading classes from grid

GigaspacesClassRelation.buildScan(...):

  • infers schema from class
  • reads classes
  • parses the RDD[class] into RDD[Row]
In Space DataFrame Type DataFrame Content
@SQLUserDefinedType(udt=MyUdt) MyUdt User object
Geospatial Shape ShapeUDT (PointUDT) Shape (Point)
Scala case class (Product) StructType Row
Java class StructType Row
scala.Int IntegerType Int
java.lang.Integer IntegerType Int

Reading documents from grid (saved by third-party app)

GigaspacesDocumentRelation.buildScan(...):

  • infers schema from descriptor
  • reads documents
  • parses the RDD[SpaceDocument] into RDD[Row]
In Space DataFrame Type DataFrame Content
Geospatial Shape ShapeUDT (PointUDT) Shape (Point)
Scala case class (Product) StructType Row
Java class StructType Row
java.lang.Integer IntegerType Int

Third-party SpaceDocument can have nested java/scala fields

Reading previously persisted documents from grid

GigaspacesDocumentRelation.buildScan(...):

  • reads schema stored in space in DataFrameSchema
  • reads documents
  • parses the RDD[SpaceDocument] into RDD[Row].
In Space DataFrame Type DataFrame Content
DocumentProperties StructType Row
java.lang.Integer IntegerType Int
Geospatial Shape ShapeUDT (PointUDT) Shape (Point)

Persisted documents do not have typed nested fields, just raw Rows

Saving dataframes to grid

GigaspacesDocumentRelation.insert(...):

  • converts Rdd[Row] into Rdd[SpaceDocument]
  • saves with rdd.saveToGrid()
  • saves schema in DataFrameSchema
In Space DataFrame Type DataFrame Content
DocumentProperties StructType Row
??? IntegerType Int
Geospatial Shape ShapeUDT (PointUDT) Shape (Point)

Read the table above from right to left: DataFrame Shape is stored as Shape field in space