CLIP-as-service

A scalable, low-latency service for embedding images and text, suitable for neural search and multimodal applications.

Author: Jina AI

Since: 2018-11-12

Overview

CLIP-as-service is a scalable, low-latency service developed by Jina AI for generating vector representations of images and text. It supports gRPC, HTTP and WebSocket protocols and can run on PyTorch, ONNX Runtime or TensorRT backends. The project focuses on embedding and reranking capabilities for neural search and multimodal retrieval, and is designed to scale horizontally for large datasets and high concurrency.

Key features

Cross-modal embeddings for both images and text.
Multiple inference backends (PyTorch, ONNX Runtime, TensorRT) for performance and compatibility.
Non-blocking duplex streaming, horizontal scaling and automatic load balancing for production deployments.
Integration with Jina and DocArray for building neural-search pipelines.

Use cases

Suitable for building multimodal retrieval, image search, text-to-image search, visual reasoning and reranking services, e.g. media asset search, content moderation, recommendation, and image-text retrieval systems.

Technical details

The project centers on CLIP models and combines asynchronous client-server design to deliver high throughput and low latency. Pluggable backends allow trade-offs between accuracy and performance across different hardware configurations, enabling large-scale, online inference deployments.

CLIP-as-service

Overview

Key features

Use cases

Technical details

Resource Info

Related Resources

Jina Serve

DeepResearch (Node implementation)

Pixeltable