B Birthday Attack
Applied AI · San Francisco

Production AI for media discovery — recommendation, retrieval, and inference at scale.

Birthday Attack is an applied-AI startup designing and operating production machine learning workloads: hybrid recommendation models, embedding retrieval, and inference infrastructure tuned for real-time serving.

See our work → Get in touch
252M
Interactions modeled
3.5M
Users embedded
155K
Titles indexed
<100ms
Inference latency

Anigraph — our flagship platform.

A production recommendation engine for anime, combining collaborative-filtering embeddings, content-based signals from a large review corpus, and a curated knowledge graph over creators, studios, and franchises.

In Production / anigraph.xyz
Anigraph

Hybrid recommender pairing an XSimGCL graph-neural model trained on hundreds of millions of user–title interactions with an XGBoost learning-to-rank reranker. NLP pipelines extract descriptive signals from the review corpus; an enrichment layer uses LLM inference for taxonomy and tagging. Served behind a low-latency Go backend with hot-reload embedding caches.

Visit site →

Where we go deep.

We build and ship production AI systems end-to-end — from model training through inference serving and the surrounding data infrastructure.

— 01

Recommendation Models

Hybrid recommender systems combining graph neural networks, embedding retrieval, and learning-to-rank rerankers — tuned for catalogs where structure and curation matter.

— 02

Inference Infrastructure

Low-latency model serving: in-memory embedding indices, hot-reload pipelines, and serving stacks engineered for sub-100ms response times under production load.

— 03

LLM Enrichment

LLM-driven content enrichment: classification, structured extraction, taxonomy generation, and tagging pipelines that augment downstream recommendation and search.

How we work.

A small team, deep ownership, and a bias toward shipping models that actually run in production.

01

Models that ship

Every model we train is built to be served — we measure offline metrics and online latency from day one.

02

Hybrid by default

Pure collaborative filtering rarely wins alone. We blend graph structure, content signals, and curation.

03

Latency is a feature

Inference is engineered, not assumed. We optimize end-to-end paths, not just model forward passes.

04

Own the stack

From data ingestion through the serving layer, we build systems we can reason about and operate.

Let's talk.

Working on production AI? Looking to partner on a recommendation or retrieval system? We take on a small number of engagements each year.

Email addycb@bdayatk.com
Based San Francisco, CA
Hours Mon–Fri, 9–6 PT
Replies within 1–2 business days.