MOZ HOW SEARCH ENGINES WORK: CRAWLING, INDEXING, AND RANKING chapter 2 Flashcards

1
Q

What are search engines and what do they do?

A

search engines are answer machines. They exist to discover, understand, and organize the internet’s content in order to offer the most relevant results to the questions searchers are asking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do search engines work?

A

Search engines work through three primary functions:

Crawling: Scour the Internet for content, looking over the code/content for each URL they find.

Indexing: Store and organize the content found during the crawling process. Once a page is in the index, it’s in the running to be displayed as a result to relevant queries.

Ranking: Provide the pieces of content that will best answer a searcher’s query, which means that results are ordered by most relevant to least relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is search engine crawling?

A

Crawling is the discovery process in which search engines send out a team of robots (known as crawlers or spiders) to find new and updated content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is content discovered?

A

content is discovered by links regardless or format (pdf, image, blog, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a search engine index?

A

a huge database of all the content they’ve discovered and deem good enough to serve up to searchers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is search engine ranking?

A

ordering of search results by relevance is known as ranking. In general, you can assume that the higher a website is ranked, the more relevant the search engine believes that site is to the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you check how many of your websites pages are in the index?

A

Head to Google and type “site:yourdomain.com” into the search bar. This will return results Google has in its index for the site specified. However, the number Google displays isn’t exact. For more accurate results you can use Google Search Console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are Robot.txt files?

A

Robots.txt files are publicly accessible and are located in the root directory of websites
(ex. yourdomain.com/robots.txt) and suggest which parts of your site search engines should and shouldn’t crawl, as well as the speed at which they crawl your site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does Googlebot treat robots.txt files?

A

If Googlebot can’t find a robots.txt file for a site, it proceeds to crawl the site.

If Googlebot finds a robots.txt file for a site, it will usually abide by the suggestions and proceed to crawl the site.

If Googlebot encounters an error while trying to access a site’s robots.txt file and can’t determine if one exists or not, it won’t crawl the site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can help Googlebot find your important pages?

A

Ask yourself this: Can the bot crawl through your website, and not just to it?

Search engine crawlers can’t see past login pages, they cant use search forms and can’t read images very well. It’s always best to add text within the markup of your webpage.

Instead, it needs a path of links on your own site to guide it from page to page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the common navigation mistakes that can keep crawlers from seeing all of your site?

A

Having a mobile navigation that shows different results than your desktop navigation

Any type of navigation where the menu items are not in the HTML, such as JavaScript-enabled navigations. Google has gotten much better at crawling and understanding Javascript, but it’s still not a perfect process. The more surefire way to ensure something gets found, understood, and indexed by Google is by putting it in the HTML.

Personalization, or showing unique navigation to a specific type of visitor versus others, could appear to be cloaking to a search engine crawler

Forgetting to link to a primary page on your website through your navigation — remember, links are the paths crawlers follow to new pages!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is information architecture and what’s the best version of it?

A

Information architecture is the practice of organizing and labeling content on a website to improve efficiency and findability for users. The best information architecture is intuitive, meaning that users shouldn’t have to think very hard to flow through your website or to find something.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a sitemap?

A

a list of URLs on your site that crawlers can use to discover and index your content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly