Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Use indexes subdirectory for custom index system path #369

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,14 @@ package com.microsoft.hyperspace.index
import org.apache.spark.sql.internal.SQLConf

object IndexConstants {
val INDEXES_DIR = "indexes"

// Config used for setting the system path, which is considered as a "root" path for Hyperspace;
// e.g, indexes are created under the system path.
val INDEX_SYSTEM_PATH = "spark.hyperspace.system.path"

// Config used for subdirectory name under the system path.
val INDEX_DIR_NAME = "spark.hyperspace.system.indexDirName"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe simply DIR or LOCATION instead of DIR_NAME?

val INDEX_DIR_NAME_DEFAULT = "indexes"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: will change the default dir name as "hyperspace".


// Config used to set the number of buckets for the index.
val INDEX_NUM_BUCKETS_LEGACY = "spark.hyperspace.index.num.buckets"
val INDEX_NUM_BUCKETS = "spark.hyperspace.index.numBuckets"
Expand Down
13 changes: 10 additions & 3 deletions src/main/scala/com/microsoft/hyperspace/index/PathResolver.scala
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,16 @@ private[hyperspace] class PathResolver(conf: SQLConf, hadoopConf: Configuration)
* @return Hyperspace index system path.
*/
def systemPath: Path = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused by now about what a system path is in Hyperspace, because the value of INDEX_SYSTEM_PATH and the return value of this function can be different, yet both are being called "system paths". Which one is the "system path"? Shouldn't we name either one differently? Maybe we can 1) improve the docstring, 2) change the config name from INDEX_SYSTEM_PATH to something like INDEX_SYSTEM_ROOT, and/or 3) use a more self-explanatory name than system path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think it's better to change the function name rather than the config string. So my suggestion is 1) keep "system path" as it is (w/o subdir) 2) give another name to "system path"/subdir path.

How about indexLocationPath?

cc @imback82

val defaultIndexesPath =
new Path(conf.getConfString("spark.sql.warehouse.dir"), "indexes")
new Path(conf.getConfString(IndexConstants.INDEX_SYSTEM_PATH, defaultIndexesPath.toString))
val indexDirName =
conf.getConfString(IndexConstants.INDEX_DIR_NAME, IndexConstants.INDEX_DIR_NAME_DEFAULT)
val indexSystemPath = conf.getConfString(
IndexConstants.INDEX_SYSTEM_PATH,
conf.getConfString("spark.sql.warehouse.dir"))
if (indexDirName.isEmpty) {
new Path(indexSystemPath)
} else {
new Path(indexSystemPath, indexDirName)
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ trait HyperspaceSuite extends SparkFunSuite with SparkInvolvedSuite {
super.beforeAll()
FileUtils.delete(systemPath)
spark.conf.set(IndexConstants.INDEX_SYSTEM_PATH, systemPath.toUri.toString)
spark.conf.set(IndexConstants.INDEX_DIR_NAME, "")
clearCache()
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ class IndexCollectionManagerTest extends SparkFunSuite with SparkInvolvedSuite {
override def beforeAll(): Unit = {
super.beforeAll()
spark.conf.set(IndexConstants.INDEX_SYSTEM_PATH, indexSystemPath)
spark.conf.set(IndexConstants.INDEX_DIR_NAME, "")
when(mockFileSystemFactory.create(any[Path], any[Configuration])).thenReturn(mockFileSystem)

indexCollectionManager = new IndexCollectionManager(
Expand Down