Web technologies Flashcards
(29 cards)
What is the main function of a crawler?
To discover web pages by following links from one webpage to another, systematically visiting pages on the web
How does a crawler work?
- It starts with a set of seed URLs and visits other pages linked from those URLs
- They follow rules and guidlines established by website owners
- Once a crawler reaches a webpage, it fetches the HTML content of the page
- The crawler examines the HTML structure and retrieves information, such as text contents and headings
- The HTML that was retrieved is broken down into individual components
- This process involves identifying elements, tags and attributes that hold valuable information
What is indexing?
- The data extracted by the crawler is indexed. This involves storing the data in a structured manner in the search engines database
- The index allows for quick retrieval and ranking of relevant web pages in response to user queries
How are web pages ranked?
When a user enters a query, the search engine searches the index for matching pages and returns the results that they believe are the highest quality and morelevant to the users query
What are the benefits of crawling and indexing?
- Improved search results: more relevant and up-to-date results
- Efficient retrieval: can search the index to produce results quickly
- Ranking & relevance: rankings are determined by various algorithms
- Freshness & updates: crawlers periodically revisit indexd pages to update the index
What does PageRank do?
Web pages are evaluated and ranked by the PageRank algorithm based on their percieved relevance and importance
What are the key elements of the PageRank algorithm?
- Link analysis
- Link weight distribution
- Iterative calculation
- Damping factor
What is link analysis in the PageRank algorithm?
- The PageRank algorithm analyses the structure of links between pages on the web
- Web pages are given importance by the algorithm, which considers the quantity and quality of inbound links from other pages
- Each link acts as a “vote” for the target page, with the voting weight determined by the quality of the linking page
- Webpages that have more “high-quality” links pointing towards them are deemed to be more important and hence ranked higher
What is the link weight distribution in the PageRank algorithm?
- The importance of a webpage is determined by the PageRank algorithm, which takes into account the total number of votes it has
- The algorithm distrbutes the weight by sharing a portion of its importance with each outgoing link
- Hence, pages of higher greater quality (max pages linking to it and min pages it links to) are given greater importance
What is the iterative calculation in the PageRank algorithm?
- The PageRank algorithm uses a repetetive calculation process. At the beginning, every webpage is given the same value
- In subsequent iterations, the significance of every webpage is re-evaluated by considering the weighted significance of inbound links
- The process repeats until the ranking become stable
What is the damping factor in the PageRank algorithm?
- The damping factor is a value between 0 and 1 (usually 0.85)
- It represents the probability that the user will not follow the link on a page
- It makes the model more realistic
What factors influence the PageRank?
- Relevance
- User engagement
- Authority and trust
- Content freshness
- Mobile - friendliness
Limitations and evolving nature of the PageRank algorithm
- Although the PageRank algorithm is important in search engines, it is not the only factor that determines webpage rankings
- Search engines use different algorithms to guarantee they provide varied and top-quality results
What is server side processing?
- Involves running code on and carrying out operations on the server instead of the clients device or browser
- Web development often utilises server side programming languages such as PHP, python or java to handle incoming requests and processing data
What is PHP?
- PHP (hypertext processor) is a server side scripting language specifically designed for web development
- PHP focuses mainly on completeing tasks on the server
Examples of server side processing with PHP
- Data retrieval and manipulation: PHP is capable of interacting with databases, processing data and generating dynamic content
- Server operations: completing tasks that are not accesible by the client. E.G: retrieving and displaying content from a database
- Form processing: form submissions and processing submitted data
Benefits of server side processing
- Imroved security measures can be implemented, ensuring secure management of sensitive data
- Uses the resources of the server to perform advanced calculations
- Consistent behaviour across different devices and browsers
- Can be easiy scaled by adding more servers
Drawbacks of server side processing
- Multiple requests can decrease overall server processing, due to increased server load
- Latency due to client communicating with server, leading to increased response times
- Relies on availability and reliability of the server
- Limits real-time interactivity and responsiveness
- May require more complex development and setup
What is client side processing?
- Involves carrying out code or processing tasks on the users device, usually within the browser, instead of on the server
- Enables users to have an interactive and dynamic experience without constantly requesting data from the server
- Primarily done using javascript
Client side processing with javascript
- It allows developer to modify web content and manage user interactions without requiring server requests
Examples of client side processing using javascript
- Form validation: allows validations of user input in real time, which means users can receive instant feedback without the need for a server roundtrip
- Developers can modify DOM (Document Object Model) to make dynamic changes to webpages content and structure
- Communication with the server happens in the background. Allows content to be updated dynamically without a full page refresh
Benefits of client side processing
- Enhanced user experience: eliminates the need for frequent server requests and page reloads
- Server load is reduced - improved scalability
- Inputs can be instantly validated and feedback can be provided in real time
- Webpage content is updated dynamically: more engaging browser experiences
- Offline functionality
Drawbacks of client side processing
- Potential security risk as data can be seen by users
- Compatability of devices and browsers may vary
- Can hurt page load time due to webpages requiring substantial processing power
- Heavily dependent on javascript: if the users device does not support javascript then the webpage won’t work
- Intellectual property is at risk
Client side processing vs server side processing
- Client side processing is better for tasks that require immediate feedback, real-time interactions and dynamic user interfaces within the browser
- Servers side processing is better for tasks that envolve accessing databases, handling sensitive data and complex business logic
- : heading tags in decreasing importance. is the most important and hence displayed the biggest. is displayed the smallest
-
is the most important and hence displayed the biggest. is displayed the smallest
-
: used for a paragraph of the text. Each paragraph is seperated by a line
- : organised list. Displayes a numbered list
-
- used to define each value in a list
- html
- head
- title
- : unorganised list. Displays a bulletpointed list
-