Overview
CLIP-as-service is a scalable, low-latency service developed by Jina AI for generating vector representations of images and text. It supports gRPC, HTTP and WebSocket protocols and can run on PyTorch, ONNX Runtime or TensorRT backends. The project focuses on embedding and reranking capabilities for neural search and multimodal retrieval, and is designed to scale horizontally for large datasets and high concurrency.
Key features
- Cross-modal embeddings for both images and text.
- Multiple inference backends (PyTorch, ONNX Runtime, TensorRT) for performance and compatibility.
- Non-blocking duplex streaming, horizontal scaling and automatic load balancing for production deployments.
- Integration with Jina and DocArray for building neural-search pipelines.
Use cases
Suitable for building multimodal retrieval, image search, text-to-image search, visual reasoning and reranking services, e.g. media asset search, content moderation, recommendation, and image-text retrieval systems.
Technical details
The project centers on CLIP models and combines asynchronous client-server design to deliver high throughput and low latency. Pluggable backends allow trade-offs between accuracy and performance across different hardware configurations, enabling large-scale, online inference deployments.