Web Scraping Flashcards

1
Q

HTTP

A

• HTTP follows a request/response paradigm. Client sends a request, server sends a response. Client is usually a web browser and server is usually a remote computer, but these can be running on the same computer as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HTML

A
  • Language for describing and formatting a document as it is encoded and sent over the Internet
  • Inspired by languages called GML (Generalized Markup Language, IBM 1969) and SGML (Standard GML, International Standards Organization ISO, 1986). “As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry.”
  • SGML became familiar HTML at CERN (European Organization for Nuclear Research) over 1989-1993 then transitioned into official worldwide standard.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

API

A
  • Software speaking to other software. What inputs/outputs should look like, what to do when an operation fails.
  • Ex: A shipping company could run an API where users send weight and location information to their server and the server returns a price estimate.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

BeautifulSoup

A

• Initially published by Leonard Richardson in 2004, BeautifulSoup is a library for parsing HTML, i.e. organizing and searching through its contents. Its primary feature is the ability to search through by tags.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly