-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Minor refactor to DeltaLogActionUtils; add CloseableIterator takeWhile and other helpful methods #4097
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,10 @@ | |
import io.delta.kernel.internal.util.Utils; | ||
import java.io.Closeable; | ||
import java.io.IOException; | ||
import java.io.UncheckedIOException; | ||
import java.util.ArrayList; | ||
import java.util.Iterator; | ||
import java.util.List; | ||
import java.util.NoSuchElementException; | ||
import java.util.function.Function; | ||
|
||
|
@@ -36,6 +39,31 @@ | |
@Evolving | ||
public interface CloseableIterator<T> extends Iterator<T>, Closeable { | ||
|
||
/** | ||
* Represents the result of applying the filter condition in the {@link | ||
* #breakableFilter(Function)} method of a {@link CloseableIterator}. This enum determines how | ||
* each element in the iterator should be handled. | ||
*/ | ||
enum BreakableFilterResult { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a common paradigm? It seems like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are you saying that we can implement breakableFilter by using a combination of both filter + takeWhile? One of my tests has an example of breakableFilter: we want to: INCLUDE 1, EXCLUDE 2, INCLUDE 3, and BREAK at 4. After thinking about this for > 30 seconds, maybe there is a way to implement this using filter + takeWhile, but it doesn't jump out at me. I think the semantics I lay out here are simpler than using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Wouldn't the test case just be? They seem equivalent to me
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Equivalently the snapshot manager code would probably be
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually kind of like it; since it nicely delineates the filter mechanism versus the terminate the loop (takeWhile) mechanism. But they honestly are super similar so I don't feel too strongly about it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I actually still prefer the current version. When they are separate, I need to understand 4 return values: 2 for the filter, and 2 for the takeWhile. It also introduces some coupling -- the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The good news is that this CloseableIterator API doesn't preclude your suggestion. i.e. other areas in the code are free to do .filter and then .takeWhile if we decide later on that that is best there. Overall -- I'm inclined to stick with what I have now. Happy to refactor it laterr -- but really I'd like to unblock @huan233usc 's PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I just hesitate to add a new concept of this breakable filter when it can be distilled into two existing things. But I agree it's easy to understand so not a big deal |
||
/** | ||
* Indicates that the current element should be included in the resulting iterator produced by | ||
* {@link #breakableFilter(Function)}. | ||
*/ | ||
INCLUDE, | ||
|
||
/** | ||
* Indicates that the current element should be excluded from the resulting iterator produced by | ||
* {@link #breakableFilter(Function)}. | ||
*/ | ||
EXCLUDE, | ||
|
||
/** | ||
* Indicates that the iteration should stop immediately and that no further elements should be | ||
* processed by {@link #breakableFilter(Function)}. | ||
*/ | ||
BREAK | ||
} | ||
|
||
/** | ||
* Returns true if the iteration has more elements. (In other words, returns true if next would | ||
* return an element rather than throwing an exception.) | ||
|
@@ -91,23 +119,81 @@ public void close() throws IOException { | |
}; | ||
} | ||
|
||
/** | ||
* Returns a new {@link CloseableIterator} that includes only the elements of this iterator for | ||
* which the given {@code mapper} function returns {@code true}. | ||
* | ||
* @param mapper A function that determines whether an element should be included in the resulting | ||
* iterator. | ||
* @return A {@link CloseableIterator} that includes only the filtered the elements of this | ||
* iterator. | ||
*/ | ||
default CloseableIterator<T> filter(Function<T, Boolean> mapper) { | ||
return breakableFilter( | ||
t -> { | ||
if (mapper.apply(t)) { | ||
return BreakableFilterResult.INCLUDE; | ||
} else { | ||
return BreakableFilterResult.EXCLUDE; | ||
} | ||
}); | ||
} | ||
|
||
/** | ||
* Returns a new {@link CloseableIterator} that includes elements from this iterator as long as | ||
* the given {@code mapper} function returns {@code true}. Once the mapper function returns {@code | ||
* false}, the iteration is terminated. | ||
* | ||
* @param mapper A function that determines whether to include an element in the resulting | ||
* iterator. | ||
* @return A {@link CloseableIterator} that stops iteration when the condition is not met. | ||
*/ | ||
default CloseableIterator<T> takeWhile(Function<T, Boolean> mapper) { | ||
return breakableFilter( | ||
t -> { | ||
if (mapper.apply(t)) { | ||
return BreakableFilterResult.INCLUDE; | ||
} else { | ||
return BreakableFilterResult.BREAK; | ||
} | ||
}); | ||
} | ||
|
||
/** | ||
* Returns a new {@link CloseableIterator} that applies a {@link BreakableFilterResult}-based | ||
* filtering function to determine whether elements of this iterator should be included or | ||
* excluded, or whether the iteration should terminate. | ||
* | ||
* @param mapper A function that determines the filtering action for each element: include, | ||
* exclude, or break. | ||
* @return A {@link CloseableIterator} that applies the specified {@link | ||
* BreakableFilterResult}-based logic. | ||
*/ | ||
default CloseableIterator<T> breakableFilter(Function<T, BreakableFilterResult> mapper) { | ||
CloseableIterator<T> delegate = this; | ||
return new CloseableIterator<T>() { | ||
T next; | ||
boolean hasLoadedNext; | ||
boolean shouldBreak = false; | ||
|
||
@Override | ||
public boolean hasNext() { | ||
if (shouldBreak) { | ||
return false; | ||
} | ||
if (hasLoadedNext) { | ||
return true; | ||
} | ||
while (delegate.hasNext()) { | ||
T potentialNext = delegate.next(); | ||
if (mapper.apply(potentialNext)) { | ||
final T potentialNext = delegate.next(); | ||
final BreakableFilterResult result = mapper.apply(potentialNext); | ||
if (result == BreakableFilterResult.INCLUDE) { | ||
next = potentialNext; | ||
hasLoadedNext = true; | ||
return true; | ||
} else if (result == BreakableFilterResult.BREAK) { | ||
shouldBreak = true; | ||
return false; | ||
} | ||
} | ||
return false; | ||
|
@@ -160,4 +246,26 @@ public void close() throws IOException { | |
} | ||
}; | ||
} | ||
|
||
/** | ||
* Collects all elements from this {@link CloseableIterator} into a {@link List}. | ||
* | ||
* <p>This method iterates through all elements of the iterator, storing them in an in-memory | ||
* list. Once iteration is complete, the iterator is automatically closed to release any | ||
* underlying resources. | ||
* | ||
* @return A {@link List} containing all elements from this iterator. | ||
* @throws UncheckedIOException If an {@link IOException} occurs while closing the iterator. | ||
*/ | ||
default List<T> toInMemoryList() { | ||
final List<T> result = new ArrayList<>(); | ||
try (CloseableIterator<T> iterator = this) { | ||
while (iterator.hasNext()) { | ||
result.add(iterator.next()); | ||
} | ||
} catch (IOException e) { | ||
throw new UncheckedIOException("Failed to close the CloseableIterator", e); | ||
} | ||
return result; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verbatim translate this logic to use the breakableFilter API instead