-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select require_alias for OS bulk inserts from ISM Policy #3560
Conversation
@dlvenable I am not sure, if I understood your suggestion in #3342 correctly. This change will set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KarstenSchnitter , Thank you for putting this PR together. I left a couple comments. Additionally, did you test that this solves #3342? We'd also want some unit tests to ensure the behavior. And ideally an integration test in OpenSearchSinkIT
, though that is not as important as the unit tests.
} else if (isEstimateBulkSizeUsingCompression) { | ||
LOG.warn("Estimate bulk request size using compression was enabled but request compression is disabled. " + | ||
"Estimating bulk request size without compression."); | ||
bulkRequestSupplier = () -> new JavaClientAccumulatingUncompressedBulkRequest(new BulkRequest.Builder()); | ||
bulkRequestSupplier = () -> new JavaClientAccumulatingUncompressedBulkRequest(new BulkRequest.Builder().requireAlias(requireAlias)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any situations in which the require_alias would vary from document to document? If so, we'd want this on the individual operations. I'm currently not setting any such situations, but I may be missing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://opensearch.org/docs/2.11/api-reference/document-apis/bulk/#url-parameters, require_alias
is a URL parameter of the bulk request. As such it cannot be changed for each document but only for the entire request.
If you wanted this setting to be depending on the document, you would have to group documents with different settings into different bulk requests. In any case, you would need a configuration on event level, that carries the alias configuration. That could either be the event or the record metadata. Both can easily be accessed in the OpenSearchSink#doOutput to create different bulk requests. This would be a much larger change, also requiring a decision on how to set the alias configuration for the event.
@@ -5,4 +5,5 @@ | |||
public interface IndexManager{ | |||
void setupIndex() throws IOException; | |||
String getIndexName(final String indexAlias) throws IOException; | |||
boolean checkISMEnabled() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are looking to expose this, it may make more sense to create a new method such as isIndexAlias
. It could internally hold a boolean and thus avoid unnecessary checks on the cluster. As it is now, by calling this method twice, Data Prepper will unnecessarily make two requests to the cluster to get cluster settings.
Also, I believe this method (checkISMEnabled
) is actually checking only the cluster. I believe the actual decision is a combination of two things: 1) The index is configured as an alias, and 2) the cluster supports ISM. e.g currentIndexUsesAlias && clusterHasIsmEnabled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to the logic to meet your comment. I am not sure, whether ISM really needs to be enabled, though. Please have a look at the new version. It also includes unit tests for isIndexAlias()
.
Thanks @KarstenSchnitter . It looks good. Would you like to make it "Ready for review?" |
This change requires an alias when writing to an aliased index. This avoids creation of an index without alias, when a previous existing alias and index was deleted. It increases robustness of DataPrepper's trace index against OS user interactions. Signed-off-by: Karsten Schnitter <[email protected]>
During OS sink initialization it is determined from OS, whether the configured index actually is an alias. If so, bulk request will require the index to always be an alias. The response is cached to avoid further requests. This also ensures, that the alias configuration is kept in the initially intended state. After all, this change is about to prevent an automatic index creation for a formerly existing alias. Signed-off-by: Karsten Schnitter <[email protected]>
Signed-off-by: Karsten Schnitter <[email protected]>
b9e3c79
to
05fcfe3
Compare
I fixed the checkstyle issue with the imports and rebased the branch. I will investigate the integration test failures based on this state. |
The integration test failures stem from an "Forbidden" response on the "isAlias" API. This is the underlying error message:
I can reproduce this locally with the integration test, but not with curl. For this I use:
This is the correct response for a non-existing alias. Note, that it is not a "FORBIDDEN" response. @dlvenable, do you have an idea, how to fix this? |
Note, that the failing test are the ones, that do not use the admin credentials but credentials with random uuids. I think the relevant code is in the OpenSearchSecurityAccessor: Lines 25 to 27 in 7869eb7
The configured roles do not allow the HEAD request on the alias. Would it be ok to change that, or do I need to find a way around the required roles? |
I tried adding |
The specific user used in some tests of OpenSerachSinkIT needs get permissions on all aliases to test for their existence. Another bug with determining the alias name is fixed as well. As a final result, the DataPrepper OpenSearch user requires write access to the indices and now additionally read access to the aliases. This can be a change for self-managed indices. Signed-off-by: Karsten Schnitter <[email protected]>
There are still integration tests failures for OpenDistro versions. This might be connected to the fact, that "require_alias" was only introduced in Elasticsearch 7.10. This might lead to errors in the bulk insert operations in earlier versions. Is there a way to detect the OS/ES version, so that the new behavior can be optional? |
That is a good find on the code failing prior to ES 7.10. We do need to continue supporting versions from 6.8 and beyond. We do not currently have a mechanism in Data Prepper to check the version. That is something we probably should add. Would you be able to include it in this PR? |
For me I could do a |
The `require_alias` parameter for bulk requests was only introduced with ES 7.10. Since DataPrepper needs to be compatible down to 6.8, the parameter should not be used in earlier OD versions. This change will apply the parameter only when OpenSearch is detected as target. Signed-off-by: Karsten Schnitter <[email protected]>
For checking the OS version, the test user needs an additional permission. Signed-off-by: Karsten Schnitter <[email protected]>
I implemented a detection of OpenSearch, so that the |
// Try to get the OpenSearch version. This fails on older OpenDistro versions, that do not support | ||
// `require_alias` as a bulk API parameter. All OpenSearch versions do, as this was introduced in | ||
// ES 7.10. | ||
openSearchClient.info(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable from this object, you can get the OpenSearch version as a String. For this feature, it is enough to detect OpenSearch. For other use-cases, you might want to extend this detection, so that you can ask for a required minimum version. But this requires parsing the version string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thank you for your work on this. And adding the error checking should help for users who also do not have permissions to the info API presently.
Description
This change sets the
require_alias
property on OpenSearch bulk requests when writing to an aliased index. This avoids creation of an index without alias, when a previous existing alias and index was deleted. It increases robustness of DataPrepper's trace index against OS user interactions.Issues Resolved
Resolves #3342
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.