Binary Independence Model

Binary Independence Model (BIM)

The Binary Independence Model is a classic probabilistic model for IR. It makes two fundamental assumptions:

  1. Binary: Documents and queries are represented as binary incidence vectors (a term is either present or absent).
  2. Independence: The presence of one term is independent of the presence of any other term, given the relevance or non-relevance of the document.

Retrieval Status Value (RSV)

The BIM ranks documents using the log-odds of relevance:

where:

  • — probability that term is present in a relevant document.
  • — probability that term is present in a non-relevant document.

Counting Evidence

BIM treats terms as clues. If a term is very likely to appear in “good” docs and very unlikely in “bad” docs, seeing that term in a document is strong evidence for relevance. By assuming independence, we can simply add up the “weight of evidence” for every matching term to get a final score.

Key Properties

  • Simplistic but Foundation: It ignores term frequency (TF) and document length, which makes it less effective than BM25 on its own.
  • Basis for BM25: BM25 was created by extending BIM with TF and length normalization.
  • Probability Ranking Principle: BIM is a direct implementation of the principle that a system should rank documents by their probability of relevance.

Connections

Appears In