What Data Scientists Do Flashcards
(28 cards)
What example did Dr. Murtaza Haider investigate to demonstrate the role of a data scientist?
Dr. Haider found a relationship between unexpected bad weather and the number of public transit complaints in Toronto.
How can data scientists help tackle environmental challenges like water toxicity?
By using artificial neural networks, data scientists can help predict algae blooms and safeguard ecosystems.
What did Norman White build that simplified intricate problems across departments?
He built a recommendation engine.
What educational tools does Dr. White use to teach future data scientists?
Python notebooks, Unix, Linux, relational databases, and tools like Pandas.
What educational backgrounds does Dr. Vincent Granville list as necessary for a data scientist?
Algebra, calculus, training in probability, and statistics.
What is the difference between a statistician and a data scientist according to Dr. Granville?
A data scientist uses statistics, but is not only a statistician.
What is statistical regression used for?
To show the probable relationship between two variables, such as distance driven and gas used.
What machine learning algorithm is mentioned in the text for processing big data?
Nearest neighbor.
Why should the term ‘big data’ be used with caution?
Because what was once considered big data is constantly evolving due to innovation.
What tools have expanded the possibilities for handling big data?
Tools like Hadoop and software advancements have expanded the limits for handling data.
What sets a data scientist apart according to Dr. Patel?
Their ability to unlock insights and convey compelling narratives to stakeholders.
What types of data do data scientists work with?
Data from a wide variety of sources, including video, audio, and text (structured and unstructured).
What are some common data formats used by data scientists?
Delimited text files, spreadsheets, XML, PDFs, and JSON.
What quality does Rachel Schutt highlight as making a data scientist exceptional?
Curiosity.
What skills and roles does a data scientist combine, according to Rachel Schutt?
A blend of computer scientist, software engineer, and statistician.
What defines a data scientist’s prowess according to Rachel Schutt?
Their ability to transform unstructured solutions into structured insights.
What are Comma-separated values (CSV) / Tab-separated values (TSV)?
Commonly used format for storing tabular data as plain text where either the comma or the tab separates each value.
What are data file types?
A computer file configuration designed to store data in a specific way.
What is a data format?
How data is encoded so it can be stored within a data file type.
What is data visualization?
A visual way, such as a graph, of representing data in a readily understandable way makes it easier to see trends in the data.
What is a delimited text file?
A plain text file where a specific character separates the data values.
What is Extensible Markup Language (XML)?
A language designed to structure, store, and enable data exchange between various technologies.
What is Hadoop?
An open-source framework designed to store and process large datasets across clusters of computers.
What is JavaScript Object Notation (JSON)?
A data format compatible with various programming languages for two applications to exchange structured data.