Source plugin: MongoDb [Spark]
Read data from MongoDB.
name | type | required | default value |
---|---|---|---|
readconfig.uri | string | yes | - |
readconfig.database | string | yes | - |
readconfig.collection | string | yes | - |
readconfig.* | string | no | - |
schema | string | no | - |
common-options | string | yes | - |
MongoDB uri
MongoDB database
MongoDB collection
More other parameters can be configured here, see MongoDB Configuration for details, see the Input Configuration section. The way to specify parameters is to prefix the original parameter name readconfig.
For example, the way to set spark.mongodb.input.partitioner
is readconfig.spark.mongodb.input.partitioner="MongoPaginateBySizePartitioner"
. If you do not specify these optional parameters, the default values of the official MongoDB documentation will be used.
Because MongoDB
does not have the concept of schema
, when spark reads MongoDB
, it will sample MongoDB
data and infer the schema
. In fact, this process will be slow and may be inaccurate. This parameter can be manually specified. Avoid these problems. schema
is a json
string, such as {\"name\":\"string\",\"age\":\"integer\",\"addrs\":{\"country\":\"string\ ",\"city\":\"string\"}}
Source Plugin common parameters, refer to Source Plugin for details
mongodb {
readconfig.uri = "mongodb://username:[email protected]:27017/mypost"
readconfig.database = "mydatabase"
readconfig.collection = "mycollection"
readconfig.spark.mongodb.input.partitioner = "MongoPaginateBySizePartitioner"
schema="{\"name\":\"string\",\"age\":\"integer\",\"addrs\":{\"country\":\"string\",\"city\":\"string\"}}"
result_table_name = "mongodb_result_table"
}