Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot learn from large numbers of interactions #719

Open
MariaZharova opened this issue Nov 5, 2024 · 0 comments
Open

Cannot learn from large numbers of interactions #719

MariaZharova opened this issue Nov 5, 2024 · 0 comments

Comments

@MariaZharova
Copy link

Hi there!
I'm trying to fit LightFM model on a quite large dataset: it contains ~13 million items and ~38.5 million users. But the main problem is that the number of interactions is more than 3 billion, this is > (2^32 - 1). And I have the following error while calling fit method:

File ~/.cache/pypoetry/virtualenvs/complementary-items-GrOWNq8P-py3.10/lib/python3.10/site-packages/lightfm/lightfm.py:684, in LightFM._run_epoch(self, item_features, user_features, interactions, sample_weight, num_threads, loss)
    677 """
    678 Run an individual epoch.
    679 """
    681 if loss in ("warp", "bpr", "warp-kos"):
    682     # The CSR conversion needs to happen before shuffle indices are created.
    683     # Calling .tocsr may result in a change in the data arrays of the COO matrix,
--> 684     positives_lookup = CSRMatrix(
    685         self._get_positives_lookup_matrix(interactions)
    686     )
    688 # Create shuffle indexes.
    689 shuffle_indices = np.arange(len(interactions.data), dtype=np.int32)

File lightfm/_lightfm_fast_openmp.pyx:167, in lightfm._lightfm_fast_openmp.CSRMatrix.__init__()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

I guess this problem caused by using int type for num interactions in CPython-file _lightfm_fast_openmp.c. Is there a plan to expand the data type to long?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant