-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19232: [ABFS][FNSOverBlob] Implementing Ingress Support with various Fallback Handling #7272
base: trunk
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some thoughts around production code.
Will do another round of review for test code,.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
Outdated
Show resolved
Hide resolved
...op-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
.../java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidIngressServiceException.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlock.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlockStatus.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsErrors.java
Outdated
Show resolved
Hide resolved
|
||
/** | ||
* Creates or retrieves an existing Azure ingress handler based on the service type and provided parameters. | ||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like xml tag is not closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which tag is not closed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<p>
tag in comments
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Suggestions for test code improvement.
hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml
Outdated
Show resolved
Hide resolved
.../java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidIngressServiceException.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsCustomEncryption.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/AbstractAbfsIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsCustomEncryption.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending Test Code Review
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/AzcopyExecutionException.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/AzcopyExecutionException.java
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsHttpClientRequestExecutor.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFilesystemAcl.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
@@ -1140,7 +1387,7 @@ public boolean checkIsDir(AbfsHttpOperation result) { | |||
public boolean checkUserError(int responseStatusCode) { | |||
return (responseStatusCode >= HttpURLConnection.HTTP_BAD_REQUEST | |||
&& responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR | |||
&& responseStatusCode != HttpURLConnection.HTTP_CONFLICT); | |||
&& responseStatusCode != HTTP_CONFLICT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do same changes for above two codes (HTTP_INTERNAL_ERROR, HTTP_BAD_REQUEST) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted this change
this.outputStream = outputStream; | ||
this.offset = offset; | ||
DataBlocks.BlockFactory blockFactory = outputStream.getBlockManager().getBlockFactory(); | ||
long blockCount = outputStream.getBlockManager().getBlockCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using these variables (blockCount, blockSize) only at one place, it would be better to call it inplace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to store them as variables for better readability
+ "exists and its resource type is invalid for this operation."; | ||
public static final String BLOB_OPERATION_NOT_SUPPORTED = "Blob operation is not supported."; | ||
public static final String INVALID_APPEND_OPERATION = "The resource was created or modified by the Azure Blob Service API " | ||
+ "and cannot be appended to by the Azure Data Lake Storage Service API"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo error: and cannot be appended by the Azure Data Lake Storage Service API
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the error message comes from the backend
boolean hasActiveBlockDataToUpload() { | ||
AzureBlockManager blockManager = getBlockManager(); | ||
AbfsBlock activeBlock = blockManager.getActiveBlock(); | ||
return blockManager.hasActiveBlock() && activeBlock.hasData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can blockManager and activeBlock be null in any of the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasActiveBlock checks for null for activeBlock and blockManager can not be null
if (hasActiveBlock()) { | ||
clearActiveBlock(); | ||
} | ||
getBlockManager().clearActiveBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not have check on hasActiveBlock before clearActiveBlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
@VisibleForTesting | ||
AbfsClient getClient() { | ||
synchronized AbfsClient getClient() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was reported as a spot bug hence I added it but dont see the need for it hence reverting
AbfsBlobClient blobClient = abfsOutputStream.getClientHandler().getBlobClient(); | ||
final AbfsRestOperation op = blobClient | ||
.getBlockList(abfsOutputStream.getPath(), tracingContext); | ||
committedBlockIdList = op.getResult().getBlockIdList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A check on op
and op.getResult
before calling getBlockIdList is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
BlobAppendRequestParameters blobParams = new BlobAppendRequestParameters(blockToUpload.getBlockId(), getETag()); | ||
reqParams.setBlobParams(blobParams); | ||
AbfsRestOperation op; | ||
long threadId = Thread.currentThread().getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get Thread.currentThread().getId() inplace as we are only using it once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
TracingContext tracingContext) throws IOException { | ||
TracingContext tracingContextAppend = new TracingContext(tracingContext); | ||
long threadId = Thread.currentThread().getId(); | ||
String threadIdStr = String.valueOf(threadId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, if threadIf is getting used at only one place, can we keep it inplace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
IOUtils.closeStreams(uploadData, activeBlock); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove 1 extra empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
AbfsRestOperation op; | ||
TracingContext tracingContextAppend = new TracingContext(tracingContext); | ||
long threadId = Thread.currentThread().getId(); | ||
String threadIdStr = String.valueOf(threadId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
throws IOException { | ||
TracingContext tracingContextFlush = new TracingContext(tracingContext); | ||
if (tracingContextFlush.getIngressHandler().equals(EMPTY_STRING)) { | ||
tracingContextFlush.setIngressHandler("DFlush"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create constant for DFlush and use it whereever required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
// Perform the upload within a performance tracking context. | ||
try (AbfsPerfInfo perfInfo = new AbfsPerfInfo( | ||
dfsClient.getAbfsPerfTracker(), | ||
"writeCurrentBufferToService", "append")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create constant for append?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present, added it here
} | ||
try { | ||
TracingContext tracingContextFlush = new TracingContext(tracingContext); | ||
tracingContextFlush.setIngressHandler("FBFlush"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, constant for FBFlush
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
// Perform the upload within a performance tracking context. | ||
try (AbfsPerfInfo perfInfo = new AbfsPerfInfo( | ||
getClient().getAbfsPerfTracker(), | ||
"writeCurrentBufferToService", "append")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, constant for append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed Latest Iteration
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidIngressServiceException.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemCreate.java
Outdated
Show resolved
Hide resolved
.../hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemLease.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
============================================================
|
Description of PR :
This Pr is in correlation to the series of work done under Parent Jira: [HADOOP-19179]
(https://issues.apache.org/jira/browse/HADOOP-19179)
Jira for this Patch: https://issues.apache.org/jira/browse/HADOOP-19232
Scope of this task is to refactor the AbfsOutputStream class to handle the ingress for DFS and Blob endpoint effectively.
Production code changes :
The
AbfsOutputStream
class is crucial for handling the data being written to Azure Storage. Its primary responsibilities include:New Additions
The new additions introduce a more modular and flexible approach to managing data ingress (data being written to storage), catering to both Azure Data Lake Storage (ADLS) and Azure Blob Storage.
AzureIngressHandler
The
AzureIngressHandler
is a new parent class designed to encapsulate common logic for data ingress operations. It simplifies the process of writing data to Azure Storage by providing a unified interface. This class has two specialized child classes:AzureDfsIngressHandler:
AzureBlobIngressHandler:
blockId
.AbfsBlock and AbfsBlobBlock
Data is managed in discrete blocks to improve efficiency and manageability.
AbfsBlock:
AbfsBlobBlock:
AbfsBlock
tailored for Blob Storage.blockId
for each block, which is necessary for the Blob Storage API.Block Managers
To manage these data blocks, new manager classes have been introduced. These classes handle the lifecycle of blocks, including creation, appending, and flushing.
AzureBlockManager:
AzureDFSBlockManager:
AbfsBlock
instances for DFS.AzureBlobBlockManager:
AbfsBlobBlock
instances for Blob Storage.blockId
.Integration with AbfsOutputStream
The
AbfsOutputStream
class has been updated to incorporate the new ingress flow logic, enhancing its ability to handle data writes to both DFS and Blob Storage. Here’s how it integrates:Configuration Selection:
AbfsOutputStream
reads the configuration parameterfs.azure.ingress.service.type
to determine whether the user has configured the system to useBLOB
orDFS
for data ingress.Handler Initialization:
AbfsOutputStream
initializes the appropriate handler (AzureBlobIngressHandler
orAzureDfsIngressHandler
).Buffering Data:
AbfsOutputStream
, it is buffered into blocks (AbfsBlock
for DFS orAbfsBlobBlock
for Blob Storage).Managing Blocks:
AzureDFSBlockManager
orAzureBlobBlockManager
) manages the lifecycle of these blocks, ensuring that data is correctly created, appended, and flushed.Block Id Management (Blob Specific):
AzureBlobBlockManager
ensures that each block has a uniqueblockId
, adhering to the requirements of the Blob Storage API.Detailed Flow
Creating Data Blocks:
AbfsOutputStream
, it is divided into blocks (AbfsBlock
for DFS orAbfsBlobBlock
for Blob Storage).Appending Data:
AzureBlobIngressHandler
orAzureDfsIngressHandler
).Flushing Data:
Lifecycle Management:
AzureDFSBlockManager
andAzureBlobBlockManager
) oversee the lifecycle of blocks, handling retries, errors, and ensuring data integrity.Test Code Changes: