Exam 1 Prep Flashcards
(38 cards)
Q: What are the 5 common methods for collecting raw data?
A: Public data, data from an existing product, human-in-the-loop, brute force, buying data.
Why is data labeling important?
A: It allows the machine learning model to understand the data and make accurate predictions, like identifying objects in images.
Q: What is the difference between simple and advanced data labeling?
A: Simple labeling tags objects (e.g., drawing boxes around cars), while advanced labeling tags every pixel in an image (e.g., marking pixels as “road” or “pedestrian”).
Q: Name three ways to speed up data labeling.
A: External annotation services, internal annotation teams, and using tools like supervised prediction or active learning.
What is “human-in-the-loop” data collection?
A: It’s when humans help a system gather data and guide its learning, like a person controlling an autonomous robot until it’s more capable
Q: What is the main message Hans Rosling is trying to convey in the TED Talk?
The world is more developed than people realize, and many countries have made significant progress in health, income, and life expectancy. The global divide is not as sharp as often portrayed.
Q: What does Hans Rosling use to present data in his TED Talk?
He uses dynamic, animated data visualizations (interactive graphs) that show the progress of countries over time. This helps make complex data more accessible and engaging.
Q: What misconception does Rosling address about the world’s countries?
He challenges the idea that countries are either “developed” or “underdeveloped” (rich vs. poor). Instead, he shows how many countries have advanced and now fall into a middle-income bracket.
Q: How does Rosling emphasize global interconnectedness in his talk?
A: He shows that the progress in one country (e.g., improvements in health or education) can influence other countries, demonstrating how the world is interconnected and how global development is possible.
Q: What role does data visualization play in Rosling’s presentation?
Data visualization makes it easier for the audience to understand complex trends and see the changes over time. It helps turn raw numbers into an engaging story and brings clarity to the data.
Q: What does Rosling believe data can help do?
Data can help correct misconceptions, inspire optimism, and promote a more accurate understanding of global progress and development.
Q: What question does Rosling ask the audience during the talk?
He asks the audience to guess the life expectancy and income of countries at different points in history to challenge their assumptions and engage them in the topic.
Q: What is one key takeaway about the future of global development?
While challenges still exist, global development has made significant progress, and with continued efforts, things will continue to improve.
Q: How does data shape people’s worldview?
A: Properly presented data can change how people perceive the world. It can break stereotypes, correct false assumptions, and give a more hopeful, nuanced view of global development.
Q: What is the significance of data science in the context of Rosling’s talk?
A: Data science allows us to analyze and visualize data to uncover patterns, trends, and insights that can help people make informed decisions and understand global issues better.
Q: What is the main difference between data science, hacking, and statistics?
A: Data science combines practical tool knowledge (like hacking) and theoretical understanding (like statistics), whereas hacking focuses on quick coding and statistics on mathematical modeling without the coding part.
Q: What is Drew Conway’s Venn diagram used to explain?
: It explains the hybrid nature of data science, involving three core areas: hacking skills, machine learning, and math/statistics knowledge. The intersection is where data science exists.
Q: What key skills are often associated with data scientists?
A: Statistics, data munging (parsing, scraping, formatting data), machine learning, and data visualization.
Q: Why is there debate over whether data science is just a rebranding of statistics?
A: Some argue that traditional statistics already covers much of what data science does, making the term “data science” a rebranding of established fields, while others see it as a necessary evolution due to new technological tools and needs.
Q: What is the role of a social scientist in data science?
A: Social scientists are valuable in data science, especially when analyzing human behavior or solving problems related to social phenomena. Their skills in asking questions and understanding context complement data analysis.
Q: What makes data science a team effort?
Data science requires a wide range of skills (programming, statistics, communication, etc.), making it impractical for one person to master everything. Teams with varied expertise work best.
Q: Why is “data scientist” a job title mostly found in industry, not academia?
A: The role of a data scientist emerged in tech companies (like LinkedIn and Facebook) to tackle complex data problems, but it hasn’t yet become an official academic title, though it may evolve in the future.
Q: What is the primary role of a data scientist in industry?
A: Data scientists in industry extract meaning from data, clean and transform it, build models, perform exploratory data analysis, and communicate insights to decision-makers. They bridge the gap between technical analysis and practical application.
Q: What are the essential components of a data science team?
A data science team should include members with expertise in statistics, machine learning, computer science, data visualization, and domain knowledge to tackle diverse aspects of data problems