Fairness in Recommendation

Definition

Fairness in Recommendation

A recommender is a multi-stakeholder system: it must serve users (consumers) and items/providers (e.g. artists, sellers, job candidates). Fairness in recommendation is the absence of systematic, unjustified disadvantage to a protected group on either side, arising from biased data or the ranking process [Ekstrand et al., 2022]. Two complementary sides:

  • User fairness — recommendation quality (accuracy) should not differ across user groups (grouped by gender, region, activity level).
  • Item / provider fairnessexposure (attention) should be distributed fairly across item groups (grouped by popularity, category, brand), not merely allocated to popular items.

Intuition

Why ranking amplifies tiny biases

Two mechanisms make recommendation unfair even from “neutral” data:

  1. Long-tail exacerbation. Interaction data follows a steep popularity curve. Because the top-K list is limited, the algorithm keeps re-recommending already-popular items, starving the long tail of any exposure (a feedback loop / Filter Bubble).
  2. Position bias. Users pay sharply decreasing attention to deeper ranks (a browsing model). So a small difference in relevance becomes a large difference in exposure. In the canonical job-seeker example [Singh & Joachims, 2018], a 0.03 gap in average relevance between two candidate groups produced a 0.32 gap in average exposure (probability of interview).

Fairness metrics therefore compare exposure (or accuracy) across groups, not raw relevance.

Mathematical Formulation

Item fairness rests on exposure: an item’s exposure is the attention it accrues, weighted by a browsing model that decays with rank position (Logarithmic, Geometric, or Cascade decay). For a group ,

where:

  • — position discount; same shape as the NDCG discount, encoding decreasing user attention at deeper ranks.
  • — position of item in the recommendation list.

User-side fairness — User Group Fairness (UGF, better) [Li et al., 2021]:

where:

  • — two user groups (e.g. advantaged vs. disadvantaged).
  • — a quality metric for user ‘s recommendation list (e.g. F1@10, NDCG).
  • UGF is the absolute gap in mean performance between groups; = perfectly fair.

Item-side fairness goals (choose by goal × groups). Statistical parity — equal exposure regardless of merit:

where:

  • DP (Demographic Parity, two groups) — exposure ratio; = parity.
  • MinMaxRatio (, multi-group) — worst-to-best exposure ratio.
  • MMF (Max-Min Fairness, , multi-group) — weight-normalized exposure of the most disadvantaged group; = group size or a quality-based value.

Equality of opportunity — exposure should be proportional to utility/merit (relevance offline):

where:

  • — exposure of group ; — utility (summed relevance); — advantaged/disadvantaged. EUR/RUR target a ratio of (RUR replaces with realized click-through ).
  • EEL (, Expected Exposure Loss) — squared distance between system exposure and target exposure .
  • IAA (, Inequity of Amortized Attention) — distance between attention and predicted relevance per item.

Key Properties / Variants

  • Goal × groups taxonomy (which metric to report):
    • Statistical parity → DP, MinMaxRatio, MMF.
    • Equality of opportunity → EUR, RUR, EEL, IAA.
    • Two groups → DP, EUR, RUR. Multiple groups → MinMaxRatio, MMF, EEL, IAA.
  • Three intervention stages (where you inject fairness):
    • Pre-processing — debias the data before training (causal, probabilistic mapping). Not supported for the recommendation task in FairDiverse.
    • In-processing — modify the training objective:
      • Re-weight / re-sample: up-weight the disadvantaged group’s loss. Weighted loss with (e.g. inverse-propensity weighting, dual-mirror descent for MMF).
      • Regularizer: add a fairness penalty, , where trades accuracy for fairness.
      • Prompt-based: fairness-aware prompts for LLM-based recommenders.
    • Post-processing — re-rank the output list to meet constraints (greedy/knapsack heuristics; learning-based fair scores). Cost is measured by Utility Loss .
  • Trade-offs. Fairness vs. accuracy is a multi-objective problem: constraints can lower accuracy and even reinforce stereotypes if low-utility items are force-promoted. But win-win cases exist (e.g. repeat-biased next-basket methods can be both more accurate and more item-fair).
  • Societal stakes. Under-exposure (providers leave the platform), eroded user trust, echo chambers/polarization, and economic inequality (e.g. fewer high-paying job ads shown to women).
Algorithm: In-processing fair training (re-weight) vs Post-processing re-rank
─────────────────────────────────────────────────────────────────────────────
# (A) In-processing: fairness embedded in the loss
for each batch:
    L_relevance = ranking_loss(scores, labels)        # e.g. BPR / BCE
    L_fairness  = group_disparity(Exposure(groups))   # e.g. MMF gap
    L = L_relevance + λ · L_fairness                   # λ = fairness strength
    θ ← θ − α ∇θ L
 
# (B) Post-processing: re-rank a fixed relevance list under a fairness goal
ranked ← sort items by relevance                       # original list
re_ranked ← greedy_select(ranked, constraint=MMF/DP)   # inject minority items
report  Utility_Loss = Utility(ranked) − Utility(re_ranked)
        + fairness metric (DP / MMF / EEL ...)

Connections

Appears In