Modal
Serverless GPU compute platform for ML workloads. Run Python functions on H100s, A100s, or T4s with zero infrastructure management. Pay per second, scale to zero when idle, deploy models as web endpoints.
Why Modal?
Running ML inference or training without managing GPU servers
Batch processing large datasets with GPU acceleration
Deploying Python-based ML models as scalable API endpoints
Signal Breakdown
What drives the Trust Score
Download Trend
Last 12 months
Tradeoffs & Caveats
Know before you commitTypeScript-only stack — Modal is Python-first (JS support is limited)
Need persistent GPU VMs for interactive training sessions — use Lambda Labs or RunPod
Tiny ML workloads — the cold start overhead is noticeable for very small tasks
Pricing
Free tier & paid plans
$30/mo compute credit
$2.48/GPU-hr (H100) · $0.90/GPU-hr (A10G)
Pay per second of GPU usage, scale to zero when idle
Alternative Tools
Other options worth considering
The GitHub of machine learning — 500k+ models, datasets, and Spaces. The hub for open-source AI with inference APIs, model hosting, and the transformers library powering most of the ML ecosystem.
Often Used Together
Complementary tools that pair well with Modal
Learning Resources
Docs, videos, tutorials, and courses
Get Started
Repository and installation options
View on GitHub
github.com/modal-labs/modal-client
pip install modalQuick Start
Copy and adapt to get going fast
import modal
app = modal.App("fastapi-inference")
image = modal.Image.debian_slim().pip_install("torch", "transformers", "fastapi")
@app.cls(gpu="A10G", image=image, container_idle_timeout=300)
class Model:
@modal.build()
def download_model(self):
from transformers import AutoModelForCausalLM, AutoTokenizer
AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-generation", model="mistralai/Mistral-7B-v0.1", device=0)
@modal.web_endpoint()
def generate(self, prompt: str) -> dict:
return {"text": self.pipe(prompt, max_new_tokens=200)[0]["generated_text"]}Code Examples
Common usage patterns
Batch image processing
Process thousands of images in parallel on GPUs
import modal
app = modal.App("batch-image-process")
image = modal.Image.debian_slim().pip_install("pillow", "torch", "torchvision")
@app.function(gpu="T4", image=image)
def process_image(image_url: str) -> dict:
from PIL import Image
import requests, torchvision.transforms as T
img = Image.open(requests.get(image_url, stream=True).raw)
transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor()])
tensor = transform(img)
return {"shape": list(tensor.shape), "url": image_url}
@app.local_entrypoint()
def main():
urls = ["https://example.com/img1.jpg", "https://example.com/img2.jpg"]
# Run all in parallel
results = list(process_image.map(urls))
print(results)Scheduled ML pipeline
Run a nightly fine-tuning job on a schedule
import modal
app = modal.App("nightly-finetune")
image = modal.Image.debian_slim().pip_install("torch", "transformers", "datasets", "peft")
volume = modal.Volume.from_name("model-checkpoints", create_if_missing=True)
@app.function(
gpu="H100",
image=image,
volumes={"/checkpoints": volume},
schedule=modal.Cron("0 2 * * *"), # 2am daily
timeout=3600,
)
def nightly_finetune():
# ... your training code
model.save_pretrained("/checkpoints/latest")
volume.commit()Community Notes
Real experiences from developers who've used this tool