TXA 24/25

[Teaching]

Text Analytics

Graduate Programs in Data Science & Business Informatics and in Digital Humanities
WDB-LM, WFU-LM


Academic Year 2024-2025, first semester
  • WHERE: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa
  • WHEN:
    • Thursday, 9:00-11:00 - Fib M1 (Polo Fibonacci B)
    • Friday, 14:00-16:00 - Fib M1 (Polo Fibonacci B)
  • OFFICE HOURS: Monday, 14:00-16:00 (by appointment) - room 288 @ Dpt. Computer Science
  • WHAT: Course Catalogue: TXA Programme 2024-2025 - 635AA.

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:

  • Learning essential techniques, algorithms, and models used in natural language processing.
  • Understanding of the architectures of typical text analytics applications and of libraries for building them.
  • Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Slides & Materials

Textbooks

It is mandatory to read selected chapters from:

  • D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  • S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.
Slides
Date Lecture Slides Material / Reference
19/09/2024 Introduction to the course, NLP & Text Analytics. 1 - Introduction to the Text Analytics course J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.
20/09/2024 Introduction to Python 2- Introduction to Python Notebook Introduction to Python
26/09/2024 Reminds on Probability 3 - Reminds on Probability
27/09/2024 Probabilistic Language Models 4 - Probabilistic Language Models Notebook Probabilistic Language Models
03/10/2024 Probabilistic Language Models practice
04/10/2024 Text Indexing
17/10/2024 Text Indexing
18/10/2024 Text Indexing
24/10/2024 Text Indexing - Vector Space Models
  • D. Jurafsky, J.H. Martin. Chp. 6
25/10/2024 VSM Practice and Introduction to ML
7/11/2024 Student project presentations: proposal, brainstorming, discussion.
8/11/2024
  • Student project presentations: proposal, brainstorming, discussion.
  • Machine Learning for Text Analytics
14/11/2024
  • Machine Learning for Text Analytics
15/11/2024
  • Experimental Protocols
21/11/2024
  • Topic Modeling
22/11/2024
  • Topic Modeling
28/11/2024
  • A primer on Neural Networks
29/11/2024
  • General Strike
5/12/2024
  • Student project presentations
6/12/2024
  • Student project presentations
12/12/2024
  • A primer on Neural Networks
13/12/2024
  • General strike
19/12/2024
  • Neural Language Models
20/12/2024
  • Advanced Topics
  • D. Jurafsky, J.H. Martin. Chps. 20, 24
  • Notebooks Bert

Syllabus

The course wil cover the following topics:
  • Background: Natural Language Processing, Information Retrieval and Machine Learning
  • Mathematical background: Probability, Statistics and Algebra
  • Linguistic essentials: words, lemmas, morphology, Part of Speech (PoS), syntax
  • Basic text processing: regular expression, tokenisation
  • Data collection: scraping
  • Basic modelling: collocations, language models
  • Introduction to Machine Learning: theory and practical tips
  • Libraries and tools: NLTK, Spacy, Keras, pytorch
  • Classification/Clustering
  • Sentiment Analysis/Opinion Mining
  • Information Extraction/Relation Extraction/Entity Linking
  • Transfer learning
  • Quantification

Learning Outcomes

Goals

Learning essential techniques, algorithms, and models used in natural language processing. Understanding the architectures of typical text analytics applications and of libraries for building them. Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts.

Assessment criteria of knowledge

The student will be assessed on the demonstrated ability to discuss the course contents using the appropriate terminology and to apply natural language processing techniques.

Skills

The student will be able to design, implement and evaluate applications that exploit the analysis, interpretation, and transformation of texts.

Verification of learning

Attending students will be asked to participate in a group project aimed at assessing skills in the design and implementation of a text analytics task agreed upon with the teacher.

Non-attending students will be asked to solve exercises during a written exam and oral discussion.

Teaching methods

Students will be able to analyze a text processing problem, select the correct methods to solve it, and implement a working solution. They will be aware of several issues related to the processing of text, including the reliability of the results, when applications involve human-annotated (subjective) data.

Assessment criteria of behaviors

The behavior of students will be assessed during project development and/or at the written/oral exam.

Required skills

Useful prerequisites:

  • Coding (python)
  • Probability theory
  • Information theory
Copyright © Laura Pollacci. Last updated: .
| This sites uses Colorlib theme |