Binary Independence Model

Binary Independence Model (BIM)

The Binary Independence Model is a classic probabilistic model for IR. It makes two fundamental assumptions:

Binary: Documents and queries are represented as binary incidence vectors (a term is either present or absent).

Independence: The presence of one term is independent of the presence of any other term, given the relevance or non-relevance of the document.

Retrieval Status Value (RSV)

The BIM ranks documents using the log-odds of relevance:

$RSV = \sum_{t \in q \cap d} lo g \frac{p _{t} ( 1 - u _{t} )}{u _{t} ( 1 - p _{t} )}$

where:

$p_{t} = P (x_{t} = 1∣ R)$ — probability that term $t$ is present in a relevant document.

$u_{t} = P (x_{t} = 1∣ \overset{ˉ}{R})$ — probability that term $t$ is present in a non-relevant document.

Counting Evidence

BIM treats terms as clues. If a term is very likely to appear in “good” docs and very unlikely in “bad” docs, seeing that term in a document is strong evidence for relevance. By assuming independence, we can simply add up the “weight of evidence” for every matching term to get a final score.

Key Properties

Simplistic but Foundation: It ignores term frequency (TF) and document length, which makes it less effective than BM25 on its own.
Basis for BM25: BM25 was created by extending BIM with TF and length normalization.
Probability Ranking Principle: BIM is a direct implementation of the principle that a system should rank documents by their probability of relevance.

Connections

Evolved into: BM25
Contrast: Vector Space Model (geometric), Language Model for IR (generative)
Assumptions: Term independence (similar to Naive Bayes).

Appears In

IR-L03 - Retrieval Models

Study Notes

Explorer

Binary Independence Model

Binary Independence Model

Key Properties

Connections

Appears In

Graph View

Table of Contents

Backlinks