Plackett-Luce Model

Definition

Plackett-Luce (PL) Model

The Plackett-Luce model is a probability distribution over permutations (rankings) of a set of items, parameterized by a real-valued score per item. It defines — the probability of producing ranking — as a sequence of softmax choices made without replacement: at each rank position you pick the next item with probability proportional to among the items not yet placed.

In Listwise LTR it is the probabilistic foundation of ListNet and ListMLE: instead of scoring documents independently (pointwise) or in pairs (pairwise), PL gives a single coherent distribution over the whole ranking, which we can then fit by maximum likelihood / cross-entropy.

Intuition

Think of repeatedly drawing items from a bag, but biased: an item with a higher score is more likely to be drawn next. Once drawn, it is removed, and the remaining items renormalize.

  • It is the softmax extended to ranking. A single softmax answers “which item is best?”; PL answers “which item is best, then which of the rest is best, then…” — applying softmax to a shrinking candidate set at every step.
  • The model is a generative story for a ranking, so a model that outputs scores implicitly defines a distribution over all orderings. Training = make the correct ranking probable under this distribution.
  • It satisfies Luce’s choice axiom (independence of irrelevant alternatives): the relative odds of choosing over at any step depend only on and , not on what other items are present.

Mathematical Formulation

Full ranking probability. For a permutation of items with scores :

where:

  • — the item placed at rank position
  • — score of that item (in IR, the model’s relevance score for the document)
  • — normalizer over the items still remaining at step (positions through ); the already-placed items are dropped from the denominator
  • the product runs over all choice steps (the last factor is always )

Top-1 / first-place probability. The marginal probability that item is ranked first is a plain softmax over all items:

where:

  • — the (positive) “strength” of item ; PL is often written with strengths , so
  • — partition over the full candidate set

Worked example (3 documents, ranking ):

Use in losses. The two main listwise losses are likelihoods under PL:

  • ListMLE maximizes the likelihood of the ground-truth ranking (labels sorted descending), i.e. minimizes :
  • ListNet keeps only the top-1 PL marginal and minimizes cross-entropy between the label-softmax and the score-softmax :

where is the relevance label of document and the model’s predicted score.

Key Properties / Variants

  • Generative sampling procedure (how to draw a ranking from PL):
Algorithm: Sample a ranking from Plackett-Luce(s)
──────────────────────────────────────────────────
Input: scores s_1..s_n
Remaining R ← {1, ..., n}
For k = 1 to n:
    # softmax over the items still in R
    For each i in R:  p_i ← exp(s_i) / Σ_{j in R} exp(s_j)
    Sample item m ~ Categorical(p)      # pick next-ranked item
    π(k) ← m
    R ← R \ {m}                          # sample without replacement
Return ranking π = (π(1), ..., π(n))
  • Softmax = top-1 PL. A single softmax is exactly the first step of PL; full PL is “softmax with replacement removed”, applied times.
  • Scale by exponentiation, shift-invariant. Adding a constant to every score leaves unchanged (the cancels in each ratio), so scores are only identifiable up to an additive constant.
  • Luce’s choice axiom / IIA. Pairwise odds are independent of the other alternatives — a known limitation when context matters.
  • Factorial blow-up. There are permutations, so the full distribution is intractable for large (e.g. ). Practical fixes: ListNet’s top-1 truncation, or top- PL (only model the first choice steps).
  • Ties / multiple valid orders. When several items share a label, any ordering among them is a valid ground truth; ListMLE handles this since the loss decomposes per step.
  • Relation to softmax policies in RL. The same Gibbs/softmax form underlies the Softmax Policy used in Policy Gradient methods; ranking under PL can be seen as a sequential (without-replacement) softmax policy over documents.
  • Complexity. Evaluating is given a sorted list (precompute suffix sums of ), plus to sort by label.

Connections

Appears In