Tensor Fusion

An open-source GPU virtualization and pooling solution that increases cluster utilization and optimizes inference workloads.

NexusGPU · Since 2024-11-12

Loading score...

GitHub Website

Detailed Introduction

Tensor Fusion is a GPU-cluster focused virtualization and pooling solution designed to improve cluster utilization and reduce inference latency through fine-grained resource allocation and shared memory/computation. The project targets high inference density and multi-tenant environments, offering dynamic scheduling and autoscaling to run long-lived inference services and agent clusters on the same physical infrastructure.

Main Features

Dynamic GPU pooling: partition physical GPUs into shareable virtual pools allocated to inference tasks on demand.
Low-latency inference path: optimize context loading and memory reuse to reduce cold starts and model-switch overhead.
Autoscaling & scheduling: real-time scale and schedule tasks based on load and priority.
Multi-model and multi-tenant support: solid isolation and concurrency handling for LLM and agent workloads.

Use Cases

Large-scale LLM inference platforms, improving concurrent throughput and lowering operational cost.
Service-oriented multi-model deployments that require hot model switching and memory reuse.
Hybrid edge-cloud deployments that need an efficient inference runtime for long-running agents.

Technical Features

Kernel and user-space cooperative scheduling to minimize context-switch overhead.
Kubernetes integration with compatibility for common schedulers and autoscaling components.
Memory sharding and reuse techniques to improve memory efficiency and reduce fragmentation.
Observability interfaces for monitoring GPU utilization, memory usage, and inference latency.

Tensor Fusion

Detailed Introduction

Main Features

Use Cases

Technical Features

Score Breakdown

Related Resources

Mini-SGLang

Osaurus

vLLM Production Stack