Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help me with my use case plz #38

Open
Sherafgan opened this issue Feb 20, 2017 · 9 comments
Open

Help me with my use case plz #38

Sherafgan opened this issue Feb 20, 2017 · 9 comments

Comments

@Sherafgan
Copy link

Could you please help me with a starter code for my use case)

I want to store in vector similarity db key: sentenceID value: vector. Examples:
id_1 [0.06284283101558685, 0.046207964420318604, 0.0053909290581941605, ...]
id_2 [0.006631242576986551, 0.08234132081270218, -0.0787612572312355, ...]

And then I want n top similar vectors' IDs to the given vector.

@mountain
Copy link
Member

Sorry for the late reply. If my understanding is not wrong, you want to get n top similar vectors inside the same Vector Set. So just follow below steps:

  1. Decide which parameter values you should use
  • dimension of the vectors, for example we can assume 10 here
  • the name of the vector set, for example we can assume 'vector' here
  1. Setup
> bmk b10 t1 t2 t3 t4 t5 t6 t7 t8 t9 t0
> vmk b10 vector
> rmk vector vector cosinesq

3.Fill data

> vadd vector 1 0.11 0.112 0.1123...
> vadd vector 2 0.21 0.212 0.2123...

You should notice here the number 1, 2 are their ids inside simbase, you can setup a map between your IDs and the ids here.

  1. Retrieve result

You can retrieve the inner id from your ID via the map, for example it is 1234, and then issue the command:

> rrec vector 1234 vector

Hope the above instructions help.

@Sherafgan
Copy link
Author

First two steps are okay. Others should be too, thank you! I want to perform mass insertion in redis, but there is something wrong. I saw the issues regarding redis, but figuring it out so far, I would very much appreciate your help.

Dimensions of vectors are 300, so I have b300 set. And I have batch file about 60.7MB with following commands:
vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...

I either run this batch file with cat batch.file | redis-cli -p 7654 --pipe
and get this Error writing to the server: Connection reset by peer
or run it with this cat batch.file; sleep 60 | redis-cli -p 7654 --pipe
and get this
All data transferred. Waiting for the last reply...
No replies for 30 seconds: exiting.
errors: 1, replies: 0

@mountain
Copy link
Member

The pipe mode of redis protocol is not implemented. So I think below command will work.

redis-cli vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
redis-cli vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...

@mountain
Copy link
Member

mountain commented Feb 23, 2017

Or you can use python etc

import redis

dest = redis.Redis(host='localhost', port=7654)
with open('csvdatafile.txt') as data:
    for idx, line in enumerate(data):
        line = line[:-1]
        components = line.split(',')
        dest.execute_command('vadd', 'vector', idx, *components)

@Sherafgan
Copy link
Author

Sherafgan commented Feb 27, 2017

Great the python script helped, I changed it a bit, and looks like this now:

import redis

dest = redis.Redis(host='localhost', port=7654)
with open('tmpFiles/t300.txt') as t300:
    for idx, line in enumerate(t300):
        line = line[:-1]
        b = line.split(' ')
print("Setting vector dimensions (b300): " + dest.execute_command('bmk', 'b300', line))
print("And the name (video) of vector set with b300 dimension: " + dest.execute_command('vmk', 'b300', 'video'))
print("Setting recommender (video->video): " + dest.execute_command('rmk', 'video', 'video', 'cosinesq'))
with open('tmpFiles/batch2.txt') as data:
    for idx, line in enumerate(data):
        line = line[:-1]
        components = line.split(',')
        print("ID:" + str(idx+1) + ": " + dest.execute_command('vadd', 'video', idx+1, *components))

And successfully executed it, but after I try to get some vector: vget video 1
I get this (error) Unknown server error!
Or if I try this rrec video 1 video
I get this: (empty list or set) although I should get vecor ids.

@mountain
Copy link
Member

Could you paste the error in log file, it is at log directory

@Sherafgan
Copy link
Author

Sherafgan commented Feb 27, 2017

2017-02-27 19:24:35 INFO  SimEngineImpl:313 - loading basis[b300]
2017-02-27 19:24:36 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
	at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
	at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
	at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
	at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
	at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
	... 5 more
2017-02-27 20:07:13 INFO  SimEngineImpl:313 - loading basis[b300]
2017-02-27 20:07:13 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
	at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
	at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
	at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
	at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
	at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
	... 5 more

@Sherafgan
Copy link
Author

@mountain any guess? :) Is it that either 300 dimensions are too many for basis vector or the lengths of floating points of the vectors are too big?

@Sherafgan
Copy link
Author

I tried simbase with 10d vectors it's ok, and as I try with >10 dimension vectors (e.g. 11d, although I tried with 13d, 14d, 15d, 25d, 50d) I get the following error

2017-03-04 20:43:03 INFO  SimEngineImpl:385 - basis[b11] created
2017-03-04 20:43:03 INFO  SimEngineImpl:460 - vectorset[video] created under basis[b11]
2017-03-04 20:43:03 INFO  SimEngineImpl:727 - creating recommendation[video_video] with funcscore[cosinesq]
2017-03-04 20:43:03 INFO  SimEngineImpl:740 - recommendation[video_video] created with funcscore[cosinesq]
2017-03-04 20:43:03 ERROR SimEngineImpl:56 - 
java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.rescore(DenseVectorSet.java:276)
	at com.guokr.simbase.store.Recommendation.processDenseChangedEvt(Recommendation.java:129)
	at com.guokr.simbase.store.Recommendation.onVectorAdded(Recommendation.java:208)
	at com.guokr.simbase.store.DenseVectorSet.add(DenseVectorSet.java:152)
	at com.guokr.simbase.engine.SimBasis.vadd(SimBasis.java:153)
	at com.guokr.simbase.engine.SimEngineImpl$14.invoke(SimEngineImpl.java:513)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants