NLP @ NUP (Spring 2025)

This intensive course aims to introduce the foundational methods, tools, and building blocks proven by modern natural language processing (NLP) applications.

  • Instructor: Dr. Dmitry Ustalov
  • Time: Fridays at 1:00pm EET/EEST (aka 12:00pm CET/CEST)
  • Location: Online (Google Meet)

This course is organized in partnership between the Neapolis University Pafos, Constructor University Bremen, and JetBrains.

Neapolis University Pafos
Constructor University
JetBrains Academy

Topics

Introduction
Course Logistics. History of Field. Text Processing. Language Resources.
Evaluation
Problem of Evaluation. Offline Evaluation. Decision Support Systems. Classification Curves. Statistical Significance. Inter-Rater Agreement. Ablations and Red Teaming.
N-Grams
Zipf's Law. Out-of-Vocabulary Words. Tokenization. N-Grams and Smoothing. Perplexity.
Information Retrieval
Search Problem. Inverted Index. Vector Space Model. Boolean Retrieval. Ranked Retrieval. Learning-to-Rank. TREC.
Embeddings
Distributional Semantics. Pointwise Mutual Information. Latent Semantic Analysis. Word Embeddings. Similarity, Analogies, and Lexical Semantics. Vector Search.
Transformer
Attention. Transformer. BERT and RoBERTa. GPT-1 and GPT-2. Not Transformer.
Large Language Models (LLMs)
Pre-Training, Fine-Tuning, Alignment. Low-Rank Adaptation and Quantization. Prompting. Retrieval Augmented Generation (RAG). Leaderboards.

Classes

TopicFormatDate
1IntroductionLecture2025-02-14
2EvaluationLecture2025-02-21
3N-GramsLecture2025-02-28
4Information RetrievalLecture2025-03-07
5Information RetrievalSeminar2025-03-14
6EmbeddingsLecture2025-03-21
7TransformerLecture2025-03-28
8EmbeddingsSeminar2025-04-04
9Large Language ModelsLecture2025-04-11
10Large Language ModelsSeminar2025-04-25
11Invited SpeakerLecture2025-05-02
12Wrapping UpSeminar2025-05-09

Assignments

TopicSeminar DateDeadline
1Search Engine2025-03-142025-04-03
2Search Engine II2025-04-042025-04-24
3Question Answering2025-04-252025-05-08

Assignments are available only to the enrolled students. The solutions should be submitted to Kaggle by the end of the deadline day (AoE time zone). Please grant read access to the notebooks with your solutions to the course staff.

Grading

  • The course contains three assignments, and you must complete all three to pass
  • Assignments are graded automatically using the Kaggle leaderboard
  • Your solutions must score higher than the baseline scores set by course staff
  • The use of large language models (LLMs) for doing the assignments is permitted, but you are expected to be able to explain every single line of your code

Resources