You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When performing similarity search using FAISS (Facebook AI Similarity Search), the results often come back as raw, low-level data that isn't easily readable or useful to a human.
For example, the output might look something like this:
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: The position of the most similar item in the search results.
Distance: A numerical value indicating how similar the result is to the query (lower distances generally mean more similarity).
Text: A string of seemingly random symbols and letters. This is the vector representation used by FAISS, which is not directly human-readable.
In short, FAISS is returning the raw vector data or its internal representation, which requires additional processing or translation into a more interpretable format (e.g., string/text mapping, nearest neighbors, etc.).
Minimal reproducible example
""" Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
"""
defcreate_faiss_index():
start_time=time.time() # Start time for measuring the function's execution timeembeddings_file='/var/www/html/zsapiens/llama-models/models/data/embeddings.npy'# Check if embeddings file exists and is loaded correctlytry:
embeddings=np.load(embeddings_file, allow_pickle=True)
ifembeddingsisNoneorembeddings.shape[0] ==0:
raiseValueError("Embeddings file is empty or not loaded correctly.")
print(f"Loaded {embeddings.shape[0]} embeddings.")
print(f"Embeddings shape: {embeddings.shape}") # Add this line to print the shapeexceptExceptionase:
print(f"Error loading embeddings: {e}")
print("Regenerating embeddings...")
# Regenerate embeddings if the file doesn't exist or is emptyembeddings=create_embeddings() # Replace with your actual embedding generation functionnp.save(embeddings_file, embeddings)
print(f"Embeddings saved to {embeddings_file}.")
# Create FAISS index on CPUtry:
index=faiss.IndexFlatL2(embeddings.shape[1]) # Assuming L2 distance metricindex.add(embeddings)
print(f"Added {embeddings.shape[0]} embeddings to the FAISS index.")
exceptExceptionase:
print(f"Error creating FAISS index: {e}")
return# Serialize the FAISS index on the CPUtry:
index_file='/var/www/html/zsapiens/llama-models/models/data/faiss_index.index'faiss.write_index(index, index_file)
print(f"FAISS index created and saved to {index_file}.")
exceptExceptionase:
print(f"Error saving FAISS index: {e}")
returnend_time=time.time() # End timeexecution_time=end_time-start_time# Calculate execution timeprint(f"create_faiss_index executed in {execution_time:.2f} seconds.")
Output
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
Rank: 3, Distance: 1.6708736419677734, Text: K tL L (M 6M PM M M M 6N xN N N N MO XO O O O O P P P JQ ^Q Q R R S S S S S S S S T T T U JU U U U 1V 2V V W }W W W X X X UY bY Y Y Y >Z Z ?[ [ [
Rank: 4, Distance: 1.6997497081756592, Text: ` [ d m B | h n Y k w B H t \ w ( W & > 9 u ~ 6 u / 7
Rank: 5, Distance: 1.7031402587890625, Text: < c z W H D M % 0 " - 8 C N Y d o z ; .
Additional context
Expected: The output should contain human-readable data (e.g., nearest neighbor texts, objects, or descriptions).
Actual: The output contains raw vector data (e.g., a string of random symbols) which isn't interpretable without further processing.
The text was updated successfully, but these errors were encountered:
Describe the bug
When performing similarity search using FAISS (Facebook AI Similarity Search), the results often come back as raw, low-level data that isn't easily readable or useful to a human.
For example, the output might look something like this:
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: The position of the most similar item in the search results.
Distance: A numerical value indicating how similar the result is to the query (lower distances generally mean more similarity).
Text: A string of seemingly random symbols and letters. This is the vector representation used by FAISS, which is not directly human-readable.
In short, FAISS is returning the raw vector data or its internal representation, which requires additional processing or translation into a more interpretable format (e.g., string/text mapping, nearest neighbors, etc.).
Minimal reproducible example
""" Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
"""
Output
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
Rank: 3, Distance: 1.6708736419677734, Text: K tL L (M 6M PM M M M 6N xN N N N MO XO O O O O P P P JQ ^Q Q R R S S S S S S S S T T T U JU U U U 1V 2V V W }W W W X X X UY bY Y Y Y >Z Z ?[ [ [
Rank: 4, Distance: 1.6997497081756592, Text: ` [ d m B | h n Y k w B H t \ w ( W & > 9 u ~ 6 u / 7
Rank: 5, Distance: 1.7031402587890625, Text: < c z W H D M % 0 " - 8 C N Y d o z ; .
Runtime Environment
Model: llama-3.1-8b
OS: Ubuntu 22.04
GPU VRAM: 48 GB (NVIDIA RTX A6000)
Number of GPUs: 1
GPU Make: NVIDIA RTX A6000
FAISS Version: (e.g., faiss-gpu )
Additional context
Expected: The output should contain human-readable data (e.g., nearest neighbor texts, objects, or descriptions).
Actual: The output contains raw vector data (e.g., a string of random symbols) which isn't interpretable without further processing.
The text was updated successfully, but these errors were encountered: