Text Analytics 2025/2026

TXA (635AA), 6 CFU
Graduate Programs in Data Science & Business Informatics and in Digital Humanities WDB-LM, WFU-LM

General Information

Academic Year 2025-2026, first semester

  • WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
  • WHEN:
    • Thursday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
    • Friday, 11:00-13:00 - Fib L1 (Polo Fibonacci B)
  • OFFICE HOURS: Tuesday 14:00-16:00 - room 288 @ Dpt. Computer Science (by appointment)
  • WHAT: Catalogue: TXA Programme 2025-2026 - 635AA.

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:

  • Learning essential techniques, algorithms, and models used in natural language processing.
  • Understanding of the architectures of typical text analytics applications and of libraries for building them.
  • Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Exam modalities

Students will be assessed through a written exam and an oral exam. Each student will be evaluated based on their ability to discuss the course content using appropriate terminology and to practically apply natural language processing techniques.

Attending Students

The exam consists of a written test and an oral test. The exam consists of a written test and an oral test. For the written part, attending students can choose between a traditional written exam, with questions on the topics covered in the course, or a group project carried out during the course period, designed to assess the ability to plan and implement a text analysis task agreed upon with the instructor. In both cases, the oral test follows: students who have completed the project are required to present and discuss it, including individual questions to verify each member’s understanding, and all students, regardless of the chosen option, are further examined on the remaining topics of the course.

Non-Attending Students

The exam consists of a written test and an oral test. Non-attending students will be required to answer questions and solve exercises during a written exam.
No materials other than the slides, notebook experimentation, and the selected chapters from the books are required.


Slides & Materials

Students are required to study the slides, actively work through and experiment with the notebooks, and read the selected chapters from:

Date Lecture Slides Material / Reference
25/09/2025 Introduction to the course, NLP & Text Analytics. 1 - Introduction to the Text Analytics course J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.
26/09/2024
  • Introduction to Python
  • Text Indexing
  • Regex
2- Introduction to Python & Text Indexing
2/10/2025 Reminds on Probability 3 - Reminds on Probability
9/10/2025 Probabilistic Language Models 4 - Probabilistic Language Models Notebook Probabilistic Language Models
10/10/2025 Probabilistic Language Models 5 - Practice on Probabilistic Language Models
16/10/2025 Text Indexing 6 - Text Indexing (part. 2)
17/10/2025 Text Indexing 7 - Text Indexing (part. 3) D. Jurafsky, J.H. Martin. Chps. 17 (excluding chs. 17.4 and 17.5), 19 (Introduction and ch. 19.1 only), 22 (excluding chapters 22.4 and 22.5.19)
23/10/2025 Text Indexing 8 - Text Indexing (part. 4)
24/10/2025 Lesson Cancelled
30/10/2025 Text Indexing 9 - Text Indexing (part. 5)
31/10/2025 Vector Space Models 10 - VSM (Correct Version)
  • Notebook VSM
  • D. Jurafsky, J.H. Martin. Chp. 6
6/11/2025 Machine Learning for Text Analytics 11 - Machine Learning for Text Analytics (part. 1) (Recommended reading, but not required: D. Jurafsky, J.H. Martin. Chp. 4)
7/11/2025 Machine Learning for Text Analytics 12 - Machine Learning for Text Analytics (part. 2) (correct version) Notebook Supervised Classification
13/11/2024 Student project presentations First presentation: How to
14/11/2025 Topic Modeling 13 - Topic Modeling
20/11/2025 Topic Modeling and Optimization 13.1 - Practice on Topic Modeling and Optimization Notebook Optimization
21/11/2025 A primer on Neural Networks 14 - A primer on Neural Networks
27/11/2025 NNs Types and Characteristics 15 - NNs Types and Characteristics Notebooks
28/11/2025 (Static) Embeddings 16 - (Static) Embeddings
4/12/2025 Student project presentations Second presentation: How to
5/12/2025 Contextual Embeddings & Neural Language Models 17 - Contextual Embeddings & Neural Language Models
9/12/2025 Neural Language Models & Transformer 18 - Neural Language Models - Transformer D. Jurafsky, J.H. Martin. Chp. 11,
11/12/2025 Dialogue Systems 19 - Dialogue Systems
12/12/2025 Practice on BERT - Instructions for Final Exam [Teacher present, but classrooms closed due to strike] Final Exam: How to