-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Fixes #546. --------- Co-authored-by: Sutou Kouhei <[email protected]>
- Loading branch information
Showing
32 changed files
with
5,126 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,4 @@ | |
|
||
.gitmodules | ||
dataset/src/test/resources/data/student.csv | ||
docs/Makefile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= -W | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Documentation | ||
|
||
Build with Sphinx. | ||
|
||
```bash | ||
cd docs | ||
pip install -r requirements.txt | ||
make html | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
furo==2024.8.6 | ||
myst-parser==4.0.0 | ||
Sphinx==8.1.3 | ||
sphinx-autobuild==2024.10.3 | ||
sphinx-basic-ng==1.0.0b2 | ||
sphinxcontrib-applehelp==2.0.0 | ||
sphinxcontrib-devhelp==2.0.0 | ||
sphinxcontrib-htmlhelp==2.1.0 | ||
sphinxcontrib-jsmath==1.0.1 | ||
sphinxcontrib-qthelp==2.0.0 | ||
sphinxcontrib-serializinghtml==2.0.0 |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
.. or more contributor license agreements. See the NOTICE file | ||
.. distributed with this work for additional information | ||
.. regarding copyright ownership. The ASF licenses this file | ||
.. to you under the Apache License, Version 2.0 (the | ||
.. "License"); you may not use this file except in compliance | ||
.. with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
.. software distributed under the License is distributed on an | ||
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
.. KIND, either express or implied. See the License for the | ||
.. specific language governing permissions and limitations | ||
.. under the License. | ||
Java Algorithms | ||
=============== | ||
|
||
Arrow's Java library provides algorithms for some commonly-used | ||
functionalities. The algorithms are provided in the ``org.apache.arrow.algorithm`` | ||
package of the ``algorithm`` module. | ||
|
||
Comparing Vector Elements | ||
------------------------- | ||
|
||
Comparing vector elements is the basic for many algorithms. Vector | ||
elements can be compared in one of the two ways: | ||
|
||
1. **Equality comparison**: there are two possible results for this type of comparisons: ``equal`` and ``unequal``. | ||
Currently, this type of comparison is supported through the ``org.apache.arrow.vector.compare.VectorValueEqualizer`` | ||
interface. | ||
|
||
2. **Ordering comparison**: there are three possible results for this type of comparisons: ``less than``, ``equal to`` | ||
and ``greater than``. This comparison is supported by the abstract class ``org.apache.arrow.algorithm.sort.VectorValueComparator``. | ||
|
||
We provide default implementations to compare vector elements. However, users can also define ways | ||
for customized comparisons. | ||
|
||
Vector Element Search | ||
--------------------- | ||
|
||
A search algorithm tries to find a particular value in a vector. When successful, a vector index is | ||
returned; otherwise, a ``-1`` is returned. The following search algorithms are provided: | ||
|
||
1. **Linear search**: this algorithm simply traverses the vector from the beginning, until a match is | ||
found, or the end of the vector is reached. So it takes ``O(n)`` time, where ``n`` is the number of elements | ||
in the vector. This algorithm is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#linearSearch``. | ||
|
||
2. **Binary search**: this represents a more efficient search algorithm, as it runs in ``O(log(n))`` time. | ||
However, it is only applicable to sorted vectors. To get a sorted vector, | ||
one can use one of our sorting algorithms, which will be discussed in the next section. This algorithm | ||
is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#binarySearch``. | ||
|
||
3. **Parallel search**: when the vector is large, it takes a long time to traverse the elements to search | ||
for a value. To make this process faster, one can split the vector into multiple partitions, and perform the | ||
search for each partition in parallel. This is supported by ``org.apache.arrow.algorithm.search.ParallelSearcher``. | ||
|
||
4. **Range search**: for many scenarios, there can be multiple matching values in the vector. | ||
If the vector is sorted, the matching values reside in a contiguous region in the vector. The | ||
range search algorithm tries to find the upper/lower bound of the region in ``O(log(n))`` time. | ||
An implementation is provided in ``org.apache.arrow.algorithm.search.VectorRangeSearcher``. | ||
|
||
Vector Sorting | ||
-------------- | ||
|
||
Given a vector, a sorting algorithm turns it into a sorted one. The sorting criteria must | ||
be specified by some ordering comparison operation. The sorting algorithms can be | ||
classified into the following categories: | ||
|
||
1. **In-place sorter**: an in-place sorter performs the sorting by manipulating the original | ||
vector, without creating any new vector. So it just returns the original vector after the sorting operations. | ||
Currently, we have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter`` for in-place | ||
sorting in ``O(nlog(n))`` time. As the name suggests, it only supports fixed width vectors. | ||
|
||
2. **Out-of-place sorter**: an out-of-place sorter does not mutate the original vector. Instead, | ||
it copies vector elements to a new vector in sorted order, and returns the new vector. | ||
We have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.FixedWidthOutOfPlaceVectorSorter`` | ||
and ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.VariableWidthOutOfPlaceVectorSorter`` | ||
for fixed width and variable width vectors, respectively. Both algorithms run in ``O(nlog(n))`` time. | ||
|
||
3. **Index sorter**: this sorter does not actually sort the vector. Instead, it returns an integer | ||
vector, which correspond to indices of vector elements in sorted order. With the index vector, one can | ||
easily construct a sorted vector. In addition, some other tasks can be easily achieved, like finding the ``k`` th | ||
smallest value in the vector. Index sorting is supported by ``org.apache.arrow.algorithm.sort.IndexSorter``, | ||
which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type. | ||
|
||
Other Algorithms | ||
---------------- | ||
|
||
Other algorithms include vector deduplication, dictionary encoding, etc., in the ``algorithm`` module. |
Oops, something went wrong.