NLP @ NUP (Spring 2025)
This intensive course aims to introduce the foundational methods, tools, and building blocks proven by modern natural language processing (NLP) applications.
- Instructor: Dr. Dmitry Ustalov
- Time: Fridays at 1:00pm EET/EEST (aka 12:00pm CET/CEST)
- Location: Online (Google Meet)
This course is organized in partnership between the Neapolis University Pafos, Constructor University Bremen, and JetBrains.
Topics
- Introduction
- Course Logistics. History of Field. Text Processing. Language Resources.
- Evaluation
- Problem of Evaluation. Offline Evaluation. Decision Support Systems. Classification Curves. Statistical Significance. Inter-Rater Agreement. Ablations and Red Teaming.
- N-Grams
- Zipf's Law. Out-of-Vocabulary Words. Tokenization. N-Grams and Smoothing. Perplexity.
- Information Retrieval
- Search Problem. Inverted Index. Vector Space Model. Boolean Retrieval. Ranked Retrieval. Learning-to-Rank. TREC.
- Embeddings
- Distributional Semantics. Pointwise Mutual Information. Latent Semantic Analysis. Word Embeddings. Similarity, Analogies, and Lexical Semantics. Vector Search.
- Transformer
- Attention. Transformer. BERT and RoBERTa. GPT and GPT-2. Transformer in 2025. Not Transformer.
- Large Language Models (LLMs)
- Pre-Training, Fine-Tuning, Alignment. Low-Rank Adaptation and Quantization. Prompting. Retrieval Augmented Generation (RAG). Leaderboards.
Classes
№ | Topic | Format | Date |
---|---|---|---|
1 | Introduction | Lecture | 2025-02-14 |
2 | Evaluation | Lecture | 2025-02-21 |
3 | N-Grams | Lecture | 2025-02-28 |
4 | Information Retrieval | Lecture | 2025-03-07 |
5 | Information Retrieval | Seminar | 2025-03-14 |
6 | Embeddings | Lecture | 2025-03-21 |
7 | Transformer | Lecture | 2025-03-28 |
8 | Embeddings | Seminar | 2025-04-04 |
9 | Embeddings | Seminar | 2025-04-11 |
10 | Large Language Models | Lecture | 2025-04-25 |
11 | Large Language Models | Seminar | 2025-05-02 |
12 | Invited Speaker | Lecture | 2025-05-09 |
13 | Wrapping Up | Seminar | 2025-05-23 |
Assignments
№ | Topic | Seminar Date | Deadline |
---|---|---|---|
1 | Search Engine | 2025-03-14 | 2025-04-03 |
2 | Search Engine II | 2025-04-04 | 2025-05-01 |
3 | Question Answering | 2025-05-02 | 2025-05-22 |
Assignments are available only to the enrolled students. The solutions should be submitted to Kaggle by the end of the deadline day (AoE time zone). Please grant read access to the notebooks with your solutions to the course staff: Olga, Mikhail, and Dmitry.
Grading
- The course has three assignments, all of which must be completed to pass
- Assignments are auto-graded via the Kaggle leaderboard
- Your solutions must exceed the baseline scores set by course staff
- Your work must be your own
- You may use large language models (LLMs) for coding, but you must be able to fully explain every line of your code
Resources
- Pierogue corpus (also available on Kaggle)
- Jurafsky & Martin, Speech and Language Processing (3rd ed. draft)
- WikiText-WordLevel tokenizer