-
Notifications
You must be signed in to change notification settings - Fork 12
DataFrames conversion
Leonid Poliakov edited this page Jul 14, 2016
·
11 revisions
GigaspacesClassRelation.buildScan(...)
:
- infers schema from class
- reads classes
- parses the
RDD[class]
intoRDD[Row]
In Space | DataFrame Type | DataFrame Content |
---|---|---|
@SQLUserDefinedType(udt=MyUdt) | MyUdt | User object |
Geospatial Shape | ShapeUDT (PointUDT ) |
Shape (Point ) |
Scala case class (Product) | StructType | Row |
Java class | StructType | Row |
scala.Int | IntegerType | Int |
java.lang.Integer | IntegerType | Int |
GigaspacesDocumentRelation.buildScan(...)
:
- infers schema from descriptor
- reads documents
- parses the
RDD[SpaceDocument]
intoRDD[Row]
In Space | DataFrame Type | DataFrame Content |
---|---|---|
Geospatial Shape | ShapeUDT (PointUDT ) |
Shape (Point ) |
Scala case class (Product) | StructType | Row |
Java class | StructType | Row |
java.lang.Integer | IntegerType | Int |
Third-party
SpaceDocument
can have nested java/scala fields
GigaspacesDocumentRelation.buildScan(...)
:
- reads schema stored in space in
DataFrameSchema
- reads documents
- parses the
RDD[SpaceDocument]
intoRDD[Row]
.
In Space | DataFrame Type | DataFrame Content |
---|---|---|
DocumentProperties | StructType | Row |
java.lang.Integer | IntegerType | Int |
Geospatial Shape | ShapeUDT (PointUDT ) |
Shape (Point ) |
Persisted documents do not have typed nested fields, just raw
Rows
GigaspacesDocumentRelation.insert(...)
:
- converts
Rdd[Row]
intoRdd[SpaceDocument]
- saves with
rdd.saveToGrid()
- saves schema in
DataFrameSchema
In Space | DataFrame Type | DataFrame Content |
---|---|---|
DocumentProperties | StructType | Row |
??? | IntegerType | Int |
Geospatial Shape | ShapeUDT (PointUDT ) |
Shape (Point ) |
Read the table above from right to left: DataFrame Shape is stored as Shape field in space