-
Notifications
You must be signed in to change notification settings - Fork 114
Use indexes subdirectory for custom index system path #369
base: master
Are you sure you want to change the base?
Changes from 3 commits
d28e14f
2a47dc7
fec013a
4423b3b
4537281
a2833ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,12 +19,14 @@ package com.microsoft.hyperspace.index | |
import org.apache.spark.sql.internal.SQLConf | ||
|
||
object IndexConstants { | ||
val INDEXES_DIR = "indexes" | ||
|
||
// Config used for setting the system path, which is considered as a "root" path for Hyperspace; | ||
// e.g, indexes are created under the system path. | ||
val INDEX_SYSTEM_PATH = "spark.hyperspace.system.path" | ||
|
||
// Config used for subdirectory name under the system path. | ||
val INDEX_DIR_NAME = "spark.hyperspace.system.indexDirName" | ||
val INDEX_DIR_NAME_DEFAULT = "indexes" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NOTE: will change the default dir name as "hyperspace". |
||
|
||
// Config used to set the number of buckets for the index. | ||
val INDEX_NUM_BUCKETS_LEGACY = "spark.hyperspace.index.num.buckets" | ||
val INDEX_NUM_BUCKETS = "spark.hyperspace.index.numBuckets" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,9 +63,16 @@ private[hyperspace] class PathResolver(conf: SQLConf, hadoopConf: Configuration) | |
* @return Hyperspace index system path. | ||
*/ | ||
def systemPath: Path = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little bit confused by now about what a system path is in Hyperspace, because the value of INDEX_SYSTEM_PATH and the return value of this function can be different, yet both are being called "system paths". Which one is the "system path"? Shouldn't we name either one differently? Maybe we can 1) improve the docstring, 2) change the config name from INDEX_SYSTEM_PATH to something like INDEX_SYSTEM_ROOT, and/or 3) use a more self-explanatory name than system path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I think it's better to change the function name rather than the config string. So my suggestion is 1) keep "system path" as it is (w/o subdir) 2) give another name to "system path"/subdir path. How about cc @imback82 |
||
val defaultIndexesPath = | ||
new Path(conf.getConfString("spark.sql.warehouse.dir"), "indexes") | ||
new Path(conf.getConfString(IndexConstants.INDEX_SYSTEM_PATH, defaultIndexesPath.toString)) | ||
val indexDirName = | ||
conf.getConfString(IndexConstants.INDEX_DIR_NAME, IndexConstants.INDEX_DIR_NAME_DEFAULT) | ||
val indexSystemPath = conf.getConfString( | ||
IndexConstants.INDEX_SYSTEM_PATH, | ||
conf.getConfString("spark.sql.warehouse.dir")) | ||
if (indexDirName.isEmpty) { | ||
new Path(indexSystemPath) | ||
} else { | ||
new Path(indexSystemPath, indexDirName) | ||
} | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe simply DIR or LOCATION instead of DIR_NAME?