Text Analytics 2024/2025
TXA (635AA), 6 CFU
Graduate Programs in Data Science & Business Informatics and in Digital Humanities WDB-LM, WFU-LM
General Information
Academic Year 2024-2025, first semester
- WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
- WHEN:
- Thursday, 9:00-11:00 - Fib M1 (Polo Fibonacci B)
- Friday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
- OFFICE HOURS: Monday, 14:00-16:00 - room 288 @ Dpt. Computer Science (by appointment)
- WHAT: Catalogue: TXA Programme 2024-2025 - 635AA.
Objectives
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:
- Learning essential techniques, algorithms, and models used in natural language processing.
- Understanding of the architectures of typical text analytics applications and of libraries for building them.
- Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.
Exam modalities
Students will be assessed through a written exam and an oral exam. Each student will be evaluated based on their ability to discuss the course content using appropriate terminology and to practically apply natural language processing techniques.
Attending Students
The exam consists of a written test and an oral test. Attending students can choose between two options for the written part: a written exam, with questions on the topics covered in the course, or a group project. The group project must be carried out during the course period. Its goal is to assess the students’ skills in designing and implementing a text analysis task agreed upon with the teacher.
Non-Attending Students
The exam consists of a written test and an oral test. Non-attending students will be required to answer questions and solve exercises during a written exam.
No materials other than the slides, notebook experimentation, and the selected chapters from the books are required.
Slides & Materials
Students are required to study the slides, actively work through and experiment with the notebooks, and read the selected chapters from:
- D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
- S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.
- J. Eisenstein. Introduction to Natural Language Processing. MIT Press, 2019.
- Zhai and Massung. Text Data Management and Analysis. Morgan & Claypool Publishers, 2016.
Alternate
| Date | Lecture | Slides | Material / Reference |
|---|---|---|---|
| 19/09/2024 | Introduction to the course, NLP & Text Analytics. | 1 - Introduction to the Text Analytics course | J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1. |
| 20/09/2024 | Introduction to Python | 2- Introduction to Python | Notebook Introduction to Python |
| 26/09/2024 | Reminds on Probability | 3 - Reminds on Probability | |
| 27/09/2024 | Probabilistic Language Models | 4 - Probabilistic Language Models | Notebook Probabilistic Language Models |
| 03/10/2024 | Probabilistic Language Models | practice |
|
| 04/10/2024 | Text Indexing |
|
|
| 17/10/2024 | Text Indexing | Notebook Linguistic Annotation with NLTK & Collocations with Gensim | |
| 18/10/2024 | Text Indexing | 6 - Text Indexing (part. 3) | Notebook Linguistic annotation with Stanza - Spacy |
| 24/10/2024 | Text Indexing - Vector Space Models | 6 - Text Indexing (part. 4) | D. Jurafsky, J.H. Martin. Chp. 6 |
| 25/10/2024 | VSM Practice and Introduction to ML | 10 - Introduction to ML | Notebook SVM |
| 7/11/2024 | Student project presentations: proposal, brainstorming, discussion. | ||
| 8/11/2024 |
|
12 - ML for Text Analytics | |
| 14/11/2024 | Machine Learning for Text Analytics | 13 - ML for Text Analytics | Notebook Classification |
| 15/11/2024 | Experimental Protocols | 14 - Experimental Protocols | Notebook Optimization |
| 21/11/2024 | Topic Modeling | 15 - Topic Modeling |
|
| 22/11/2024 | Topic Modeling | Notebooks Topic Modeling (Gensim & Sklearn) | |
| 28/11/2024 | A primer on Neural Networks | 17 - A primer on Neural Networks | |
| 29/11/2024 | General Strike | ||
| 5/12/2024 | Student project presentations | ||
| 6/12/2024 | Student project presentations | ||
| 12/12/2024 | A primer on Neural Networks | 18 - A primer on Neural Networks (part. 2) | Notebooks Classification with NNs |
| 13/12/2024 | General strike |
|
|
| 19/12/2024 | Neural Language Models | ||
| 20/12/2024 | Advanced Topics |