Study Notes

❯

❯

❯

Coding Assignments

❯

IR A01 Unsupervised Retrieval

IR-A01 - Unsupervised Retrieval

Jun 06, 20261 min read

IR-A01: Assignment 1 — Unsupervised Retrieval

Overview

Implement term-based matching approaches and evaluation metrics. Done in pairs, PASS/FAIL (≥80% tests).

What You Implement

Text Preprocessing

Tokenization: Split text into terms
Lowercasing, stop word removal, Stemming (NLTK)

Indexing

Build an Inverted Index with term frequencies and document lengths

Retrieval Methods

TF-IDF Search: TF-IDF weighted cosine similarity
BM25 Search: Okapi BM25 with $k_{1}$ and $b$ parameters
QL Search: Query likelihood with Dirichlet smoothing
NaiveQL Search: QL without smoothing (for comparison)

Evaluation

Implement Precision, Recall, MAP, NDCG, MRR
Evaluate all retrieval methods and compare results

Key Implementation Notes

Only allowed: nltk, numpy, matplotlib (no sklearn, gensim)
All implementation goes in modules/ directory, between BEGIN/END SOLUTION tags
Helper methods in docstrings are also tested — implement them
MS MARCO dataset for benchmarking

Resources

SEIRiP Sections: 2.3, 4.1-4.3, 5.3, 5.6-5.7, 6.2, 7, 8

Related Lectures

IR-L02 - IR Fundamentals (indexing, preprocessing)
IR-L03 - Retrieval Models (BM25, QL, TF-IDF)
IR-L04 - Evaluation (metrics)

Graph View

IR-A01: Assignment 1 — Unsupervised Retrieval
Overview
What You Implement
Text Preprocessing
Indexing
Retrieval Methods
Evaluation
Key Implementation Notes
Resources
Related Lectures

Backlinks

BM25
Inverted Index
Query Likelihood Model
TF-IDF
IR - Overview

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community