User-Item Interaction Matrix

Definition

User-Item Interaction Matrix

The user-item interaction matrix (or ratings matrix) is an matrix recording observed interactions between users and items. Each entry holds a user ‘s feedback on item — an explicit rating (e.g., stars), a signed preference (), or a binary implicit signal (clicked / not clicked). It is the core data structure of Collaborative Filtering: predictions leverage the collective user-item interaction data of a large pool of users, rather than item content.

Formally, given users and items , the matrix tabulates which items each user interacted with so the recommender can find the unseen items most pertinent to a given user .

Intuition

A mostly-empty grid we have to fill in

Picture a spreadsheet with users down the rows and items across the columns. Most cells are blank — any one user has touched only a tiny fraction of the catalog. The recommendation task is exactly predicting the missing entries: estimate how much user would like the items they have not yet seen, then rank those by predicted value to produce a Top-N Recommendation list.

The matrix is “collaborative” because the blanks are filled by borrowing signal from other users/items: if Lucy and Eric rate the same movies similarly, Lucy’s known ratings predict Eric’s unknown ones. The structure of the observed entries (who liked what) is the only input pure CF needs — no text, no images, no item metadata.

Mathematical Formulation

The interaction matrix is the object on which all CF predictors operate. With explicit ratings :

where:

  • — number of users; — number of items
  • — set of observed (user, item) pairs; (the matrix is sparse)
  • — observed feedback: a star rating, a signed like (), or implicit
  • — a missing entry; recommendation = predicting for these cells

Memory-based read of User-based Rating Prediction fills a blank by averaging the column’s ratings over ‘s nearest neighbors (users who did rate ):

Model-based read of Matrix Factorization approximates the whole matrix by a low-rank product of latent user/item factors:

where (each row a user factor), (each row an item factor), and is the number of latent concepts. A rating is reconstructed as the dot product of the corresponding row of and row of ; in the lecture’s rank-2 toy example the two latent dimensions turn out interpretable (“history” vs. “romance”).

Key Properties / Variants

  • Sparsity. The dominant practical feature: nearly all of the cells are missing, which drives the choice of algorithm and causes the cold-start problem (a new user/item is an all-blank row/column). See Data Sparsity and Cold Start Problem.
  • Feedback type. Entries encode either Explicit Feedback (numeric ratings; a blank means “not rated”) or Implicit Feedback (clicks/plays/purchases; a blank is ambiguous — disinterest or simply not-yet-seen). Implicit matrices are usually treated as binary positives + negative sampling.
  • Asymmetry of the two reads. Slicing by rows gives User-based Collaborative Filtering (similar users); slicing by columns gives Item-based Collaborative Filtering (similar items).
  • Missing-not-at-random. Observed entries are biased — popular items and active users are over-represented — so the blanks are not a random sample. This connects to Popularity Bias and to evaluating beyond accuracy.
  • Generalization by neural models. Neural Collaborative Filtering also takes one-hot user/item indices into but replaces the dot product with a learned non-linear function; classic MF is a special case of it.
  • Filling-in procedure (memory-based prediction for one blank cell):
Predict R[u, i] for a missing entry:
──────────────────────────────────────
  candidates ← { v : R[v, i] is observed, v ≠ u }
  for each v in candidates:
      s[v] ← similarity( row_u(R), row_v(R) )   # over co-rated items
  N ← top-k users v by s[v]                       # nearest neighbors N_i(u)
  R_hat[u, i] ← (1 / |N|) * Σ_{v in N} R[v, i]    # (optionally weight by s[v])
  return R_hat[u, i]

Connections

Appears In