-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Add documentation for file-based metastore #24511
base: master
Are you sure you want to change the base?
[docs] Add documentation for file-based metastore #24511
Conversation
@steveburnett I have an issue here with some more details |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
For testing purposes Presto can be configured to use a local directory as a Hive | ||
Metastore. Set the following properties in ``etc/config.properties``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metastore concept applies only to Hive and Lakehouse Connectors (Iceberg, Delta, and Hudi), so we should probably mention it.
Additionally, these properties are catalog properties and must be added to etc/catalog/catalog_name.properties
valid only for the above connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do!
@majetideepak I haven't used the file-based Hive metastore much myself, but did you encounter any issues when trying to create partitioned tables? Ideally, it should be possible and I can see that partition-specific metadata calls are implemented even for the FileHiveMetastore. |
@imjalpreet should this be in the hive.properties or config.properties? |
@ethanyzhang These should be in the catalog properties file. I also added a review comment above. |
Thanks for the additional information @majetideepak! With the new information in #19112, I am going to move this topic from where I initially put it in this PR as a small topic in Deploying Presto. I thought about moving it into the Hive Connector doc, but as it is relevant to "Hive and Lakehouse Connectors (Iceberg, Delta, and Hudi)" I think I will move it to its own page in /installation and include Deepak's instructions how to use it, which will be a big help to readers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to include this in the connectors section?
Right now, metastore properties are spread across different parts of the Hive Connector documentation, mainly in Hive Configuration Properties, Metastore Configuration Properties, and Glue Configuration Properties.
Perhaps we could extract the metastore-related documentation into a separate subsection, similar to Hive Security. This would make sense since metastore properties are relevant not just to Hive, but also to other connectors like Iceberg, Delta, and Hudi.
@imjalpreet I remember seeing some issue with partitions and file metastore. It could be due to my setup. I think its good to test this once before documenting. |
I use file bases HMS for development environment, just confirmed again that it supports Hive partitioned tables. Or maybe you encountered some specific problems when using partitioned tables with file bases HMS @majetideepak? |
Configuring a File-Based Metastore | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
For testing purposes Presto can be configured to use a local directory as a Hive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also mentioned that it is also very suitable for developing purpose?
@hantangwangd If you use it, then it's good. My issue was likely related to my setup. |
hive.metastore.catalog.dir=file:///<catalog-dir> | ||
|
||
Replace ``<catalog-dir>`` in the example with the path to a directory on the | ||
local filesystem that is relative to the Presto installation directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path need not be relative to the installation directory. It can be any path on the local filesystem.
@majetideepak I have used partitioned tables too recently with file based HMS. I didn't have any issues. |
03c832f
to
35be659
Compare
Hi everyone, thanks for your feedback! I took @imjalpreet's suggestion and moved this to a separate page in /connector, corrected the directory path that @nmahadevuni noted, added text describing the supported connectors and corrected the instructions for the properties file to the connector property file, and added usage examples from @majetideepak's #19112. I welcome everyone to review again and comment with new corrections and additions. |
|
||
For testing or developing purposes, Presto can be configured to use a filesystem | ||
directory as a Hive Metastore. This can be a directory on the local filesystem | ||
or a non-local file system such as Amazon S3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is currently supported only on local file system. S3 is not supported. Please remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sourced that from #19112, but if it's not supported then I can remove it, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified everything in #19112. Does not hurt to test again since its been a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it do not support HDFS or S3, but we can make some small changes to let it support all of them for test or development environment (that is to say, not safe in production environment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that helps!
If HDFS and S3 are not currently supported, then I will remove the reference to S3 and write the doc as "supported only on local file system". When the "changes [are made] to let it support all of them for test or development environment", I can update the doc then to add mention that HDFS and S3 are optional for test or dev only.
|
||
CREATE SCHEMA hive.warehouse; | ||
|
||
This query creates a folder as ``/data/hive_data/warehouse``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's specify that this happens if the user gave /data/hive_data/
as the path for hive.metastore.catalog.dir
for other file formats such as CSV, the user must either | ||
|
||
* manually specify the schema as shown in the example above | ||
* provide ``.prestoSchema`` and ``.prestoPermissions`` files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we verify this with a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this content from #19112, didn't test it personally. @majetideepak, does this work as described?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this. I also now remember that this is where partitioning gets tricky.
@steveburnett, @hantangwangd, @nmahadevuni |
Since we are mentioning one can provide |
35be659
to
a0a7ea5
Compare
Hi everyone! I revised based on the feedback from @nmahadevuni, @majetideepak, and @hantangwangd. PTAL when you can. Specific open questions I haven't addressed in this update:
|
@steveburnett Let's remove |
a0a7ea5
to
a915d6f
Compare
Done and done, thanks! PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @steveburnett.
Thank you @steveburnett. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveburnett should we create a dedicated sub-section for Hive Metastore instead of just “File Hive Metastore”? We could consolidate all metastore-related information from the Hive connector documentation into this section to keep it organized and less scattered.
* :doc:`/connector/hudi` | ||
* :doc:`/connector/iceberg` | ||
|
||
Partitioned tables are supported when created in the file-based metastore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this can be removed since it's expected behaviour, we could have mentioned if it wasn't supported.
Description
Add how to configure Presto to use a file-based hms to installation/deployment.rst.
Motivation and Context
@ethanyzhang suggested this would be a good addition to the Presto documentation in an internal discussion that @nmahadevuni contributed the configuration in. I discussed where such information would fit best in the Presto documentation with @tdcmeehan.
Impact
Documentation. Readers wanting to try out Presto quickly can bypass the need for the steps in Configure Hive MetaStore.
Test Plan
Local doc build. Screenshot with existing text above and below included for context.
![Screenshot 2025-02-06 at 4 41 07 PM](https://private-user-images.githubusercontent.com/7013443/410666492-553762b1-5f10-49f1-8449-0604bb6803c9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxODM5NjUsIm5iZiI6MTczOTE4MzY2NSwicGF0aCI6Ii83MDEzNDQzLzQxMDY2NjQ5Mi01NTM3NjJiMS01ZjEwLTQ5ZjEtODQ0OS0wNjA0YmI2ODAzYzkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTBUMTAzNDI1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODAxZGZkZjZhMDJiOTZmZDFiYWMxMzE5MzRjN2RjZjEyZGY5YWYyNmE1Y2VjODFlMWRjYjQ2OTJlYjhiMTBhNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.ZHth-ffTkNWjUYr8at2FmliwBRG2d3oVs1H2OMV6UH4)
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.