Text Analytics 2025/2026

TXA (635AA), 6 CFU
Graduate Programs in Data Science & Business Informatics and in Digital Humanities WDB-LM, WFU-LM

General Information

Academic Year 2025-2026, first semester

  • WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
  • WHEN:
    • Thursday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
    • Friday, 11:00-13:00 - Fib L1 (Polo Fibonacci B)
  • OFFICE HOURS: Tuesday 14:00-16:00 - room 288 @ Dpt. Computer Science (by appointment)
  • WHAT: Catalogue: TXA Programme 2025-2026 - 635AA.

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:

  • Learning essential techniques, algorithms, and models used in natural language processing.
  • Understanding of the architectures of typical text analytics applications and of libraries for building them.
  • Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Exam modalities

Students will be assessed through a written exam and an oral exam. Each student will be evaluated based on their ability to discuss the course content using appropriate terminology and to practically apply natural language processing techniques.

Attending Students

The exam consists of a written test and an oral test. The exam consists of a written test and an oral test. For the written part, attending students can choose between a traditional written exam, with questions on the topics covered in the course, or a group project carried out during the course period, designed to assess the ability to plan and implement a text analysis task agreed upon with the instructor. In both cases, the oral test follows: students who have completed the project are required to present and discuss it, including individual questions to verify each member’s understanding, and all students, regardless of the chosen option, are further examined on the remaining topics of the course.

Non-Attending Students

The exam consists of a written test and an oral test. Non-attending students will be required to answer questions and solve exercises during a written exam.
No materials other than the slides, notebook experimentation, and the selected chapters from the books are required.


Slides & Materials

Students are required to study the slides, actively work through and experiment with the notebooks, and read the selected chapters from:

Date Lecture Slides Material / Reference
25/09/2025 Introduction to the course, NLP & Text Analytics. 1 - Introduction to the Text Analytics course J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.
26/09/2024
  • Introduction to Python
  • Text Indexing
  • Regex
2- Introduction to Python & Text Indexing
2/10/2025 Reminds on Probability 3 - Reminds on Probability
9/10/2025 Probabilistic Language Models 4 - Probabilistic Language Models Notebook Probabilistic Language Models
10/10/2025 Probabilistic Language Models 5 - Practice on Probabilistic Language Models
16/10/2025 Text Indexing 6 - Text Indexing (part. 2)
17/10/2025 Text Indexing 7 - Text Indexing (part. 3) D. Jurafsky, J.H. Martin. Chps. 17 (excluding chs. 17.4 and 17.5), 19 (Introduction and ch. 19.1 only), 22 (excluding chapters 22.4 and 22.5.19)
24/10/2025 Text Indexing 8 - Text Indexing (part. 4)
25/10/2025 Lesson Cancelled