syllabus__text_data_in_economics

syllabus__text_data_in_economics_warwick

QUAPEC

Syllabus

Text Data in Economics

糖心TV Summer School 2022

Teaching Team

Instructor: , ashe@ethz.ch
TA:听听听听听听听听听听听 Claudia Marangon, claudia.marangon@gess.ethz.ch

Schedule

Lectures: XX

TA office hrs: XX

Important Links

(bit.ly/NLP-git)

Problem Set

Learning Objectives:

LO1. Implement and evaluate text-as-data methods.

LO2. Evaluate the use of text-analysis tools in economics research.

LO3. Plan a research project using text data.

Course Format

鈼� 8 lectures on zoom (12 hours)

鈼� In-person workshopping of student project papers

Assignments:

鈼� Problem set

鈼� Referee report on one of the course readings

鈼� Research proposal on a text-data project (first and second draft)

Topics Outline and Main Economics Papers Readings

Overview

Gentzkow, Kelly, and Taddy, 鈥�.鈥�
Ash and Hansen, Text Algorithms in Economics

Dictionaries (Macro)

Baker, Bloom, and Davis (2016),
Hassan, Hollander, Van Lent, and Tahoun,

Dictionaries (Micro)

Michalopoulous and Xue (2019),
Enke 2020,
Djourelova, Media persuasion through slanted language: Evidence from the coverage of immigration
Truffa and Wong (2021),
Advani, Ash, Cai, and Rasul (2022), Race-Related Research in Economics.

Document Distance

Kelly, Papanikolau, Seru, and Taddy, .
Cage, Herve, and Viaud,

Topic Models

Hansen, McMahon, and Prat, .
Ash, Morelli, and Vannoni, 鈥溾€�

Supervised Learning

Gentzkow and Shapiro (2010), .
Gentzkow, Shapiro, and Taddy (2019)
Widmer, Galletta, and Ash, Media Slant is Contagious

Word Embeddings (2 classes)

Ash, Chen, and Ornaghi (2022)
Gennaro and Ash, Emotion and Reason in Political Language (2021) and Transparency and Emotionality in Politics: Evidence from C-SPAN听 (2022).
Ash, Gennaro, Hangartner, and Stampi-Bombelli (2022)
Kozlowsky et al

Syntactic and Semantic Parsing

Ash, Gauthier, and Widmer, Text semantics capture political and economic narratives

Learning Materials

Books

鈼� Natural Language Processing in Python, Third Edition (鈥淣LTK Book鈥�).

鈼� Available at .

鈼� Classic treatments of traditional NLP tools.

鈼� Aurelien Geron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (2019)

鈼� , should be available with an academic account using ETH email.

鈼� A great practical book for machine learning and deep learning in Python, but not NLP-focused. We will use material from Chapters 2-4, 7-11, 13, and 15-17.

鈼� The deep learning chapters use Keras + TensorFlow.

鈼�

鈼� Yoav Goldberg, Neural Network Methods for Natural Language Processing (2017)

鈼� (email me if this doesn鈥檛 work)

鈼� A more advanced theoretical treatment of neural networks with an NLP focus, but already somewhat dated. We will use material from Chapters 1-17 and 19.

鈼� Jurafsky and Martin, Speech and Language Processing (3d Ed. 2019).

鈼� .

鈼� The standard theory text on computational linguistics.

Programming

Python is probably the best option for NLP, used by most data scientists. All the sample code is in Python. You are welcome to use another programming language.

鈼� New to Python?

鈼�

鈼� New to Machine Learning?

鈼�

鈼� Read the Geron Book, Chapters 1-7

鈼�

鈼� New to Text Mining / NLP?

鈼�

鈼� Read the , Chapters 1-5

鈼�

鈼� Lists of papers with replication repos.

鈼� Want to use R instead?

鈼� is popular for text analysis among political scientists.

鈼� Other resources:

鈼�

Python Libraries

pip install pandas seaborn scikit-learn tensorflow nltk gensim flair spacy transformers

鈼� Basics:

鈼� : data loading and management

鈼� : visualization

鈼� : general purpose Python ML library

鈼� : deep learning library

鈼� NLP Necessities:

鈼� : standard NLP tools

鈼� : topic models and embeddings

鈼� : tokenization, NER, syntactic parsing, word vectors

鈼� : sentiment analysis and some other tools ()

鈼� : transformer architectures

鈼� Specialized tools:

鈼� : library of models for semantic role labeling, entailment, question answering, etc

鈼� : library of embeddings

鈼� : interface from spaCy to huggingface

References

Yellow highlighting indicates required reading
Blue highlighting indicates recommended methods reading

Reference (Overview):

鈼� Gentzkow, Kelly, and Taddy, 鈥�.鈥�

鈼� Goldberg, Ch. 1

鈼� NLTK book, Chapters 1, 2, 4

鈼� Grimmer and Stewart, 鈥�.鈥�

鈼� Raschka, 鈥溾€�.

Reference (Dictionary Methods):

鈼�

Reference: 听Tokenization

鈼�

鈼� Goldberg, Ch. 6

鈼� NLTK book, Chapter 3, 5, 7, 8

鈼�

鈼� Denny and Spirling, 鈥�.鈥�

Reference (Dimensionality Reduction):

鈼� Geron, Chapters 8-9

鈼� Gilis, .

Methods (Document Distance):

鈼� Lee et al, .

鈼� Brandon Rose, 鈥�.鈥�

Methods (Topic Models)

鈼� Prabhakaran,

鈼� Quinn, et al, .

鈼� Roberts et al, 鈥�鈥�.

鈼� Christian Fong and Justin Grimmer, 鈥�.鈥�

Reference (Machine Learning):

鈼� Goldberg Ch. 2, 7

鈼�

鈼� NLTK book, chapter 6

鈼� Geron, Chapters 2-4, 7

Overview (Deep Learning for NLP)

鈼� Sebastian Ruder,

Reference (Neural Nets):

鈼�

鈼� Goldberg, Ch. 3-5

鈼� Geron, Chapters 10-11

鈼� Leslie Smith,

鈼� Chris Olah,

鈼� Baldi and Sadowski,

Reference (Embedding Layers):

鈼� Goldberg, Ch. 8

鈼�

References (RNNs):

鈼� Geron Ch. 15-17

鈼� Goldberg, Ch. 14-17

鈼� Sutskever, Vinyals, and Le,

鈼� Michael Nguyen,

鈼� Andrej Karpathy,

鈼� Chang and Masteron, .

Reference (Model Interpretation)

鈼� Ribeiro, Singh, and Guestrub, .

鈼�

Applications (MLP):

鈼� Vamossy,

鈼� Meursault,听

Applications (RNN):

鈼� [short] Iyyer et al, .听听听听听

鈼� Ash et al,

Reference (Word Embeddings):

鈼� Spirling and Rodriguez, .

鈼� Goldberg, Ch. 10-11

鈼� Chapter Yoav Goldberg and Omer Levy, 鈥溾€�.

鈼� Piero Molino, 鈥溾€�.

鈼� Matt Kusner, Yu Sun, Nicholas Kolkin, and Killian Weinberger, 鈥溾€�.

鈼� Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski,

鈼� Allen and Hospedales,

鈼� Ruder,

鈼� Peters, Ruder, and Smith, .

鈼�

鈼� Bojanowski et al, .

鈼� Antoniak and Mimno, .

鈼� Ash, Chen, and Ornaghi,

鈼� Hamilton, Clark, Leskovec, and Jurafsky, 2016,听 .

Tools (Word Embeddings)

鈼�

Contextualized Word Embeddings:

鈼� Peters et al, .

鈼�

Reference (Syntactic Parsing):

鈼� NLTK Book,

鈼� , Chapters 12-15, 20

鈼�

Reference (Semantic Role Labeling):

鈼� Jurafsky and Martin,

鈼�

Tools (Syntactic Parsing):

鈼�

Tools (Semantic Role Labeling):

鈼�

Tools for Document Embeddings

鈼� Ruder,

鈼�

References (Document Embeddings):

鈼� Arora, Liang, and Ma, 鈥�.鈥�

鈼� Doc2Vec:

鈼� Le and Mikolov, 鈥�.鈥�

鈼�

鈼� Wu et al,

鈼� Bhatia, Lau, and Baldwin, 鈥溾€�

鈼� InferSent

鈼� USE:

鈼� Cer et al, , ()

鈼� Yang et al, .

鈼� Clark, Celikyilmaz, and Smith, .

References (Attention / Transformers)

鈼� Bloem,

鈼� Ruder,

鈼�

鈼� Geron, Chapter 16

鈼� Goldberg, Ch. 17

鈼� Vaswani et al,

鈼� BERT:

鈼� Devlin et al, .

鈼�

鈼� Reimers and Guryevich,

鈼� Nie et al,

鈼� Hassan et al,

Reference (Language Models):

鈼� Goldberg, Ch. 9, 17

Reference (Transformers):

鈼�

Reference (Autoregressive Models)

鈼� Shree,

鈼� Radford et al, .

鈼�

鈼� .

鈼� Brown et al, (GPT-3)

鈼� (open-sourced GPT-3)

Reference: Conditioned Text Generation

鈼�

鈼� Dathathri et al,

鈼� CTRL

Reference (Sequence-to-Sequence Transformers)

鈼�

鈼� Lewis et al,

鈼� Raffel et al,

鈼� Narang et al,

Reference (Coreference Resolution):

鈼� Jurafsky and Martin,

Reference (Textual Entailment)

鈼�

Reference (Siamese Neural Nets)

鈼�

Reference (Discourse):

鈼� Jurafsky and Martin,

Reference (Dialogue):

鈼� Li et al, .

鈼�

鈼� Luo et al,

Big Bird

鈼�

Reference (Information Extraction):

鈼� NLTK Book,

鈼� Jurafsky and Martin,

鈼� Angeli et al, .

鈼�

鈼� Qin et al,

Reference (Knowledge Graphs)

鈼�

鈼� Nickel,

鈼� Joulin et al,

鈼� Lehmann et al, .

鈼� ()

鈼�

鈼� Yao et al,

Reference (Summarization):

鈼� Gabriel et al,

鈼� See et al,

鈼�

鈼� Stiennon et al,

Reference (Question Answering):

鈼� Jurafsky and Martin,

鈼�

Reference (Claim Checking):

鈼� Vlachos, e-Fever

Language Model Interpretation:

鈼�

Reference (Legal AI)

鈼� Zhong et al,

Methods (Causal Inference with Text):

鈼� Keith et al,

鈼� Wood-Doughty et al,

鈼� Egami, Fong, Grimmer, Roberts, and Stewart,

Additional Applications

Complexity in Text):

鈼� Katz and Bommarito, 鈥�.鈥�

鈼� Katz et al,

鈼� Benoit, Munger, and Spirling (2017), 鈥溾€�.

鈼� [short] Louis and Nenkova (2013),

Dictionary Methods:

鈼� Michalopoulous and Xue (2019),

鈼� Baker, Bloom, and Davis (2016),

鈼� Djourelova,

鈼� Enke 2020,

鈼� Cao et al,

Tokens/N-grams:

鈼� Gentzkow and Shapiro (2010), .

鈼� Ash, Morelli, and Van Weelden, .

Document Distance:

鈼� Kelly, Papanikolau, Seru, and Taddy, .

鈼� Hoberg and Phillips, .

Topic Models

鈼� Barron, Huang, Spang, and DeDeo, . [has appendix]

鈼� Hansen, McMahon, and Prat, .

鈼� Ash, Morelli, and Vannoni, 鈥溾€�

Text Classification:

鈼� Osnabrugge, Ash, and Morelli,

鈼� Widmer, Galletta, and Ash,

鈼� Gentzkow, Shapiro, and Taddy (2019), 鈥�.鈥�

鈼� Kelly et al,

鈼� Peterson and Spirling,

Word Embeddings:

鈼� Ash, Chen, and Ornaghi,

鈼� Gennaro and Ash,

鈼� Caliskan et al, 鈥溾€�

鈼� Bolukbasi et al, .

鈼� Kozlowski, Taddy, and Evans 2019,

鈼� Stoltz and Taylor,

鈼� [short] Gillani and Levy, .

鈼� Garg et al 2018, [includes appendix]

鈼� [short] Lucy et al,

鈼� Thompson et al, [includes appendix]

鈼� Rheault and Cochrane, .

鈼� Nyarko and Sanga, .

Syntactic Parsing:

鈼� Hoyle et al, .

鈼� [short] Ash, Jacobs, MacLeod, Naidu, and Stammbach, .

鈼� Vannoni, Ash, and Morelli,

鈼� Michael Webb,

Semantic Role Labeling:

鈼� Ash, Gauthier, and Widmer, Mining narratives from large text corpora

鈼� Fetzer,

Information Extraction:

鈼� [short] Surdeanu et al 2011, .

鈼� [short] Jurafsky and Chambers, .

鈼� [short] Clark, Ji, and Smith, .

鈼� [short] Bamman and Smith, .

鈼� [short] Wyner and Peters, .

鈼� [short] Xia and Ding,

Document Embeddings:

鈼� [short] Demzsky et al, 2019,

鈼� [short] Dai, Olah, and Le, .

鈼� Ash and Chen 2018, .

鈼� Galletta, Ash, and Chen,

鈼� Ash, Chen, and Naidu (2020), 鈥�.鈥�

鈼� [short] Tong et al, .

Transformer Classification:

鈼� Bingler et al,

鈼� Pei and Jurgens,

Language Models:

鈼� [short] Peric, Mijic, Stammbach, and Ash,

鈼� Kreps, McCain, and Brundage, .

鈼� [short] Peng et al,

鈼� [short] Adeem, Bethky, Reddy,

Local Semantics

鈼� Ross et al, [short]

鈼� Prabhakaran et al, .

Global Semantics

鈼� [short] Stammbach and Ash,

鈼� Chen et al,

鈼� Vold and Conrad, .

Causal Inference with Text:

鈼� Margaret Roberts, Brandon Stewart, and Richard Nielsen, 鈥溾€�

鈼� [short] Veitch et al,

鈼� All the papers in Table 1 .

鈼� Zeng et al, .

Argument Mining

鈼� [short] Subramanian et al,

Quote Extraction

鈼� Newell et al,

糖心TV

Learning Materials

References

Yellow highlighting indicates required reading Blue highlighting indicates recommended methods reading

Yellow highlighting indicates required reading
Blue highlighting indicates recommended methods reading