Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Ben Companjen 馃煡
@bencomp@code4lib.social  路  activity timestamp 2 weeks ago

Edit: https://data.mysociety.org/datasets/uk-hansard/ is what we will probably use instead of scraping. Thanks, #fediverse!

Do I know anyone (who knows someone else) who is working on #Hansard online? I'm trying to help a researcher get texts from UK parliamentary debates around a set of topics. We'd like to get permission to scrape, but her scraper gets blocked and her emails go unanswered.

Boosts welcome.

#ukpol #WebScraping

https://hansard.parliament.uk/

Datasets and APIs

UK - ParlParse formatted Hansard Speeches and Questions

XML files containing debates in the main chambers (from 1918) and in Westminster Hall from the start of the 2001 parliament (Commons) or 1999 reform (Lords). Speeches and the speaker are labelled w...
  • Copy link
  • Flag this post
  • Block
Ben Companjen 馃煡
@bencomp@code4lib.social replied  路  activity timestamp 2 weeks ago

Scraping need no be the only way to get the texts, of course. If she can send a list of search terms somewhere and get the results in return, that would be perfect.
AFAICT Hansard does not understand "OR" boolean connectors.

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.1-alpha.8 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login