TIGER
Definition
TIGER (Transformer Index for GEnerative Recommenders)
TIGER (Rajput et al., Recommender Systems with Generative Retrieval, NeurIPS 2023) is the canonical Generative Retrieval model for next-item recommendation. It works in two stages: (1) an offline tokenizer (RQ-VAE) maps each item’s content embedding to a short tuple of discrete codeword indices — its Semantic ID; (2) a seq2seq Transformer (encoder–decoder) reads the user’s history of semantic IDs and autoregressively generates the semantic ID of the next item, token by token. The generated ID is then looked up in the catalogue. This replaces “score every candidate over a fixed pool” with “decode the target identifier.”
Intuition
Generate the address, don't scan the warehouse
Classical Sequential Recommendation (e.g. SASRec) encodes the history into a state and scores every catalogue item via — a softmax over millions of atomic IDs whose table grows linearly with the catalogue. TIGER instead gives each item a structured “address” of codes (e.g. ), where related items share a coarse prefix. The model then spells out the next item’s address one code at a time, exactly like a language model predicting the next token. Because only code tokens exist (e.g. ), they combine into possible IDs — capacity is decoupled from vocabulary size, the embedding table stays tiny, and a brand-new item gets a decodable address from its content alone (warm cold-start), without ever having been clicked.
Mathematical Formulation
TIGER has two distinct objectives, one per stage.
Stage 1 — RQ-VAE tokenizer (build the Semantic ID)
Encode item content embedding to a latent , then residual-quantize it over codebooks. Starting from , at each level pick the nearest codeword, record its index, and pass the residual on: The Semantic ID is the index tuple ; the quantized latent is , decoded back to . The tokenizer is trained with
where:
- — item content embedding (TIGER uses a Sentence-T5 vector over title/brand/category)
- — the -th codeword vector in codebook ( codewords per level)
- — selected index at level (coarse fine as grows)
- — reconstruction loss; — codebook/commitment loss with the stop-gradient straight-through estimator (VQ-style)
- — ID length (toy figure uses , ); a trailing token is appended for collision handling so each tuple maps to one item
Stage 2 — Autoregressive generation (the recommender)
Given user history (a flat sequence of the history items’ semantic-ID tokens), the next item’s ID is decoded one code at a time: trained with next-token cross-entropy (teacher forcing): At inference the likelihood doubles as a ranking score, , and a ranked list is produced by beam search over the decoding steps.
where:
- — the -th codebook token of item (so each item spans decoder positions, not one)
- — parameters of the T5-style Transformer Model (bidirectional encoder + autoregressive decoder)
- the codebook (– entries) replaces the BPE vocabulary; output is item identifiers, not natural language
Key Properties / Variants
- Two-stage, frozen tokenizer: semantic-ID construction is an offline preprocessing step; once trained the RQ-VAE is frozen and the generator predicts indices, never the continuous embeddings. The decoder of the RQ-VAE is discarded for serving.
- Hierarchical prefixes: earlier codes are coarse (e.g. broad category), later codes refine the residual. Items sharing are coarsely similar; the full tuple disambiguates. This prefix tree is what enables cold-start generalization and controllable/diverse retrieval.
- Validity is not free: of the possible code sequences only a tiny fraction are real items, so naive decoding can emit non-existent IDs. TIGER-style systems use Trie-Constrained Decoding: store all valid catalogue IDs in a trie and apply a logit mask so only on-path tokens are allowed. (Complementary fix: reward validity during RL.)
- Decoding pathologies: popular prefixes dominate beam search (popularity amplification), top- items share prefixes (homogeneity), and each recommendation costs sequential steps + trie lookup (latency).
- Empirical result: on Amazon Sports/Beauty/Toys, RQ-VAE semantic IDs beat Random IDs and LSH-based IDs, and TIGER outperforms SASRec / S³-Rec / P5 baselines on Recall@K and NDCG@K — establishing item tokenization as a modelling choice, not mere preprocessing.
- Architecture variants: TIGER uses an encoder–decoder (T5-style, “read history fully, then write”); decoder-only successors (HSTU, OneRec, GPTRec) treat
[history || target SID]as one stream and scale to longer histories. - Tokenizer variants / successors: beyond reconstruction-only RQ-VAE — CoST adds a contrastive objective, LETTER adds semantic + collaborative + diversity regularizers, ActionPiece makes tokens context-aware, and LC-Rec tunes an LLM over the semantic IDs. RQ-KMeans, R-VQ, and product-quantization (VQ-Rec) are competing tokenizers; the best choice depends on embedding space and task.
Algorithm: TIGER (two-stage generative retrieval)
──────────────────────────────────────────────────
STAGE 1 — Offline tokenization (per catalogue item i)
x_i ← ContentEncoder(title, brand, category) # Sentence-T5 embedding
z_i ← RQVAE_Encoder(x_i)
r ← z_i
for d = 1 .. L:
c[d] ← argmin_k || r - e[d,k] ||^2 # nearest codeword
r ← r - e[d, c[d]] # pass residual on
SID(i) ← (c[1], ..., c[L]) (+ extra token if collision)
train RQ-VAE by min ||x_i - Decoder(sum_d e[d,c[d]])||^2 + L_rqvae
freeze tokenizer; build trie of all valid SIDs
STAGE 2 — Train the seq2seq recommender
for each user history (i_1,...,i_t -> i_{t+1}):
input ← flatten(SID(i_1) ... SID(i_t)) # t * L tokens
target ← SID(i_{t+1}) # L tokens
minimize - sum_l log p_theta(z_l | input, z_<l) # teacher forcing
INFERENCE — recommend next items
encode user history -> context
beam search (size B) over L steps, masked by trie # valid SIDs only
emit B complete SIDs -> map back to items
filter (drop already-seen, dedup, business rules) -> ranked listConnections
- Special case of: Generative Recommendation / Generative Retrieval (SID-based, generate the identifier)
- Tokenizer: RQ-VAE (residual quantization producing Semantic IDs); contrast with Atomic Item IDs and full-text IDs
- Decoder objective: Autoregressive Generation with next-token cross-entropy, Beam Search, Trie-Constrained Decoding
- Replaces the score-and-rank skeleton of: SASRec, Sequential Recommendation
- Recommendation analogue of generative IR: DSI, GENRE (generate a document identifier)
- Item-tokenization ladder context: see Item Tokenization / Item ID Tokenization
- Scaling sibling route (native, not borrowed): HSTU, Large Recommendation Models (LRM)