Chroma DB
Setting up chroma DB to use for storing image embeddings for semantic search and image retrieval.
Getting started
Install dependencies before we get started.
pip install -U picachain
Prepare images
To get started, we need a set of images that will act as our knowledgebase which we are going to push on the vector database.
Images for Search
# list of images
from PIL import Image
images = [Image.open('img1.png'), Image.open('img2.png')]
Query image
img = Image.open("img.png")
plt.imshow(img)
Import dependencies
# import dependencies
from picachain.chains.image.search import ImageSearchChain
from picachain.datastore import ChromaStore
from picachain.embedding import ClipEmbedding
from picachain.retriever import ImageRetriever
Create vectorstore
Picachain makes it easy to initiate a vector store index in chormadb.
# initiate vectorstore
datastore = ChromaStore("test-collection")
Prepare the CLIP Embedding
To create embeddings, we use CLIP model by OpenAI. To read more on CLIP model, check out openai's official documentation.
embedding = ClipEmbedding()
ClipEmbedding
is responsible for vectorization of images.
Initiate Retriever
Now that our vectorstore and embedding is initiated, we build the Image retriever.
retriever = ImageRetriever(datastore, embedding, images)
Create Image Search Chain
Next step is to create an ImageSearchChain
that takes a query image an retrieves top_k
similar images from our vector store.
img_chain = ImageSearchChain.from_image(retriever, embedding, img) # img is query image
result = img_chain.similar_images(top_k=3) # retrive top 3 images
Plot the images
import matplotlib.pyplot as plt
num_images = len(result)
fig, axes = plt.subplots(1, num_images, figsize=(12, 4))
for i, (img, score) in enumerate(result):
axes[i].imshow(img)
axes[i].set_title(f'Score: {score}')
axes[i].axis('off')
plt.tight_layout()
plt.show()