In the latest installment of the Generative AI lecture series, Constantine Caramanis demonstrates practical applications of InfoNCE loss and contrastive learning. The session focuses on fine-tuning BERT and MiniLM models using the SciFact dataset—a collection of scientific claims and related passages—to improve their performance as semantic embeddings for search tasks.
Before and after training, performance is measured using NDCG and Recall metrics, showing quantitative gains from domain-specific fine-tuning. The lecture includes a live Colab notebook where viewers can follow along with the implementation.
Key concepts covered: InfoNCE, Contrastive Loss, embeddings, fine-tuning, and similarity matrices.