skip to content

Social Sciences Research Methods Centre

 

Module Overview

Please read this module overview carefully before making a booking.

Lecturers Dr Rolf Fredheim (CRASSH)
Time / Venue Tuesdays, 16:00-18:00 (Titan Room 2)
Overview The internet is a great resource for humanities and social science data, but most information is apparently chaotic. In this course we will explore how to programmatically access information stored online, typically in html, to create neat, tabulated data ready for analysis. The course is made up of four tutorials, designed to build the tools needed to effectively collect different types of data. The uses of web scraping are diverse: in this course we will use the programming language R to first access data directly from newspapers, and secondly by accessing live data streams using APIs (YouTube, Facebook, Google Maps, Wikipedia). Collectively these sessions will give the skillsets necessary to use web scraping in students’ own research. Slides from last year’s sessions may be consulted here: http://fredheir.github.io/WebScraping.
Prerequisites Familiarity with R and an interest in online data collection. Students should be comfortable with the RStudio interface, and if possible attend one of the department’s ‘Introduction to R’ sessions. Formal programming knowledge and understanding of HTML is not expected.
Assessment Not applicable
Textbook No readings are assigned, but students should ensure they are comfortable with the basics of R. This is covered in the first ten videos of Roger Peng’s 'Computing for Data Analysis' course, available on YouTube.

 

Teaching schedule

Session Topic(s) to be covered Date
1
Introduction to web scraping: working with text in R 17 February
2
Scraping newspaper content and metadata 24 February
3
APIs, machine readable sources, and scaling up data collection 10 March