AtBS 11 Web Scraping Flashcards

1
Q

How to open up Google from the shell?

A

import webbrowser

webbrowser.open (‘www.google.com’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What module lets you easily download files from the Web?

A

request module

need to pip install request the first time to get it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Syntax to download a webpage?

A

res = requests.get (‘URL’)

pg 237

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to check if a request download worked?

A

res.status_code == requests .codes.ok

should equal True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find the length of a requests download?

A

len (res.text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When saving a web page what is important about saving it and why?

A

Need to save it in a binary format.

Important to do this so that the file can maintain Unicode characters.

pg 239

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does res stand for?

A

Response

It is what you get from a requests.get (‘URL’) pull

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does res .iter_content (100000) do?

A

It helps to download files in chuncks instead of having to pull everything at once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Steps to download and save webpage to harddrive?

A

import requests

res = requests .get(‘URL’)

FileName= open (‘SaveFileName’, ‘wb’)

for chunk in res.iter_content(100000):
FileName .write(chunk)

FileName .close()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the res.raise_for_status() do>

A

It checks for an error when downloading a webpage

pg 238

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to import the Beautiful Soup module?

A

import bs4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Steps to create a Beautiful Soup Object from a webpage?

A

import requests, bs4

res = requests.get(‘URL’)

res.raise _for_status()

Soup Var Name = bs4.BeautifulSoup (res.text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly