Text Analytics 2024/2025

TXA (635AA), 6 CFU
Graduate Programs in Data Science & Business Informatics and in Digital Humanities WDB-LM, WFU-LM

General Information

Academic Year 2024-2025, first semester

  • WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
  • WHEN:
    • Thursday, 9:00-11:00 - Fib M1 (Polo Fibonacci B)
    • Friday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
  • OFFICE HOURS: Monday, 14:00-16:00 - room 288 @ Dpt. Computer Science (by appointment)
  • WHAT: Catalogue: TXA Programme 2024-2025 - 635AA.

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:

  • Learning essential techniques, algorithms, and models used in natural language processing.
  • Understanding of the architectures of typical text analytics applications and of libraries for building them.
  • Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Exam modalities

Students will be assessed through a written exam and an oral exam. Each student will be evaluated based on their ability to discuss the course content using appropriate terminology and to practically apply natural language processing techniques.

Attending Students

The exam consists of a written test and an oral test. Attending students can choose between two options for the written part: a written exam, with questions on the topics covered in the course, or a group project. The group project must be carried out during the course period. Its goal is to assess the students’ skills in designing and implementing a text analysis task agreed upon with the teacher.

Non-Attending Students

The exam consists of a written test and an oral test. Non-attending students will be required to answer questions and solve exercises during a written exam.
No materials other than the slides, notebook experimentation, and the selected chapters from the books are required.


Slides & Materials

Students are required to study the slides, actively work through and experiment with the notebooks, and read the selected chapters from:

  • D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  • S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.
  • J. Eisenstein. Introduction to Natural Language Processing. MIT Press, 2019.
  • Zhai and Massung. Text Data Management and Analysis. Morgan & Claypool Publishers, 2016.

Alternate

Date Lecture Slides Material / Reference
19/09/2024 Introduction to the course, NLP & Text Analytics. 1 - Introduction to the Text Analytics course J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.
20/09/2024 Introduction to Python 2- Introduction to Python Notebook Introduction to Python
26/09/2024 Reminds on Probability 3 - Reminds on Probability
27/09/2024 Probabilistic Language Models 4 - Probabilistic Language Models Notebook Probabilistic Language Models
03/10/2024 Probabilistic Language Models practice
04/10/2024 Text Indexing
17/10/2024 Text Indexing Notebook Linguistic Annotation with NLTK & Collocations with Gensim
18/10/2024 Text Indexing 6 - Text Indexing (part. 3) Notebook Linguistic annotation with Stanza - Spacy
24/10/2024 Text Indexing - Vector Space Models 6 - Text Indexing (part. 4) D. Jurafsky, J.H. Martin. Chp. 6
25/10/2024 VSM Practice and Introduction to ML 10 - Introduction to ML Notebook SVM
7/11/2024 Student project presentations: proposal, brainstorming, discussion.
8/11/2024
  • Student project presentations: proposal, brainstorming, discussion.
  • Machine Learning for Text Analytics
12 - ML for Text Analytics
14/11/2024 Machine Learning for Text Analytics 13 - ML for Text Analytics Notebook Classification
15/11/2024 Experimental Protocols 14 - Experimental Protocols Notebook Optimization
21/11/2024 Topic Modeling 15 - Topic Modeling
22/11/2024 Topic Modeling Notebooks Topic Modeling (Gensim & Sklearn)
28/11/2024 A primer on Neural Networks 17 - A primer on Neural Networks
29/11/2024 General Strike
5/12/2024 Student project presentations
6/12/2024 Student project presentations
12/12/2024 A primer on Neural Networks 18 - A primer on Neural Networks (part. 2) Notebooks Classification with NNs
13/12/2024 General strike
19/12/2024 Neural Language Models
20/12/2024 Advanced Topics
  • D. Jurafsky, J.H. Martin. Chps. 20, 24
  • Notebooks Bert