Text Analytics 2025/2026
TXA (635AA), 6 CFU
Graduate Programs in Data Science & Business Informatics and in Digital Humanities WDB-LM, WFU-LM
General Information
Academic Year 2025-2026, first semester
- WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
- WHEN:
- Thursday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
- Friday, 11:00-13:00 - Fib L1 (Polo Fibonacci B)
- OFFICE HOURS: Tuesday 14:00-16:00 - room 288 @ Dpt. Computer Science (by appointment)
- WHAT: Catalogue: TXA Programme 2025-2026 - 635AA.
Objectives
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:
- Learning essential techniques, algorithms, and models used in natural language processing.
- Understanding of the architectures of typical text analytics applications and of libraries for building them.
- Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.
Exam modalities
Students will be assessed through a written exam and an oral exam. Each student will be evaluated based on their ability to discuss the course content using appropriate terminology and to practically apply natural language processing techniques.
Attending Students
The exam consists of a written test and an oral test. The exam consists of a written test and an oral test. For the written part, attending students can choose between a traditional written exam, with questions on the topics covered in the course, or a group project carried out during the course period, designed to assess the ability to plan and implement a text analysis task agreed upon with the instructor. In both cases, the oral test follows: students who have completed the project are required to present and discuss it, including individual questions to verify each member’s understanding, and all students, regardless of the chosen option, are further examined on the remaining topics of the course.
Non-Attending Students
The exam consists of a written test and an oral test. Non-attending students will be required to answer questions and solve exercises during a written exam.
No materials other than the slides, notebook experimentation, and the selected chapters from the books are required.
Slides & Materials
Students are required to study the slides, actively work through and experiment with the notebooks, and read the selected chapters from:
- D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
- S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.
- J. Eisenstein. Introduction to Natural Language Processing. MIT Press, 2019.
- Zhai and Massung. Text Data Management and Analysis. Morgan & Claypool Publishers, 2016.
| Date | Lecture | Slides | Material / Reference |
|---|---|---|---|
| 25/09/2025 | Introduction to the course, NLP & Text Analytics. | 1 - Introduction to the Text Analytics course | J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1. |
| 26/09/2024 |
|
2- Introduction to Python & Text Indexing |
|
| 2/10/2025 | Reminds on Probability | 3 - Reminds on Probability | |
| 9/10/2025 | Probabilistic Language Models | 4 - Probabilistic Language Models | Notebook Probabilistic Language Models |
| 10/10/2025 | Probabilistic Language Models | 5 - Practice on Probabilistic Language Models |
|
| 16/10/2025 | Text Indexing | 6 - Text Indexing (part. 2) |
|
| 17/10/2025 | Text Indexing | 7 - Text Indexing (part. 3) | D. Jurafsky, J.H. Martin. Chps. 17 (excluding chs. 17.4 and 17.5), 19 (Introduction and ch. 19.1 only), 22 (excluding chapters 22.4 and 22.5.19) |
| 24/10/2025 | Text Indexing | 8 - Text Indexing (part. 4) | |
| 25/10/2025 | Lesson Cancelled |