Exam Practice Flashcards

1
Q

What is Big Data?

A

Four pillars;

1) Information
2) Technology
3) Impact
4) Methods

Big Data is the Information Asset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What dimensions underlie Big Data?

A

1) Volume: quantity of available data
2) Velocity: rate at which data is collected/recorded
3) Veracity: quality and applicability of data
4) Variety: different types of data available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Garner’s description of Big Data?

A

Big data is high-volume, high velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we call it “Big” Data?

A

Because the resources exceed the capabilities of traditional computing environments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the drivers of Big Data?

A

Non-exhaustive list:

  • Increased data volumes
  • Rapid acceleration of data growth
  • Growing variation in data types for analysis
  • Alternate and unsynchronized methods for facilitating data delivery
  • Rising demand for real-time integration of analytical results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the process for piloting technologies to determine their feasibility and business value and for engaging business sponsors and socializing the benefits of a selected technique?

A

1) channel the energy and effort of test-driving big data technologies
2) determine whether those technologies add value
3) devise a communication strategy for sharing the message to the right people within the organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What must happen to bring big data analytics into organization’s system development life cycle to enable their use?

A

1) develop tactics for technologists, data management professionals and business stakeholders to work together
2) migrate the Big Data projects into the production environment in a controlled and managed way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to assess value of Big Data?

A

1) feasibility: does organisational setup permit new and emerging technologies?
2) reasonability: is the resource requirements within capacity?
3) value: do the result warrant investment?
4) integrability: any impediments within the organisation?
5) sustainability: maintenance costs manageable?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Hadoop? And mention the three important layers.

A

Apache Hadoop is a collection of open-source software utilities for distributed storage and processing of Big Data using the MapReduce programming model

Important layers:

1) Hadoop Distributed File System (HDFS)
2) MapReduce
3) YARN: job scheduling and cluster management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can organisations plan to support Big Data?

A

Get the people right (Business Evangelists, Technical Evangelists, Business Analysts, Big Data Application Architect, Application Developer, Program Manager, Data Scientists)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is parallel computing?

A

Type of computation where many calculations are carried out simultaneously. Problems can be broken into pieces and solved at the same time.

Parallelism has long been employed in high-performance computing (multi-core processors).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is distributed computing?

A

Model in which components of a software system are shared among multiple computers to improve efficiency and performance.

For example, in the typical distribution using the 3-tier model, user interface processing is performed in the PC at the user’s location, business processing is done in a remote computer, and database access and processing is conducted in another computer that provides centralized access for many business processes. Typically, this kind of distributed computing uses the client/server communications model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Big Data Landscape 2016?

A
Infrastructure (e.g. Hadoop)
Analytics (e.g. Statistical Computing)
Applications (e.g. Sales and Marketing)
Cross-Infrastructure/Analytics (e.g. Google)
Open Source
Data Sources (e.g. Apple Health)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Big Data Landscape of 2019?

A
Infrastructure
Analytics and Machine Learning
Applications - Enterprise
Applications - Industry
Cross-Infrastructure/Analytics
Open Source
Data Sources
Data Resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Big Data framework?

A

Analytical applications that combine the means for developing and implementing algorithms, which must access, consume and manage data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is encompassed in a technological ecosystem?

A
  • Scalable storage
  • Computing platform
  • Data management environment
  • Application development framework
  • Scalable analytics
  • Project management processes and tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe row-oriented data

A

The entire record must be read to access required attributes

Traditional database systems employ a row-oriented layout: values at specific rows are laid out consecutively in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe column-oriented data

A

Values are stored by column per variable
Values can be stored separately

Reduced latency to access data compared to row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are key difference between row vs. column-oriented data

A

Four dimensions of comparison;

1) Access performance: column faster than row
2) Speed of joins and aggregation: column has less access latency than row
3) Suitability to compression: with column you can compress data to decrease storage needs while maintaining high performance. Difficult to apply compression to row for increasing performance
4) Data load speed: in row, all data values need to be stored together and therefore prevent parallel loading. In columns, data can be segregated, thus allowing to load columns in parallel (dual cores) using multiple threads to work on each column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe tools and techniques for data management.

A

1) Processing capability: multi processing nodes often incorporate multiple cores to handle tasks run simultaneously
2) Memory: holds the data in the node currently running and generally has an upper limit per node
3) Storage: provides persistence of data and it is the place where datasets and databases are kept ready to be accessed
4) Network: this is the communication infrastructure between nodes and allows for information exchange

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a cluster in data architecture?

A

A collection of interconnected nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mention architecture cluster types from class.

A
  • Fully connected network topology (all-to-all)
  • Common bus topology (sequence, one-to-next)
  • Mesh network topology (some-to-some)
  • Star network topology (one-to-many)
  • Ring network topology (neighbor-to-neighbor)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe in detail the three layers of Hadoop.

A

1) HDFS
Attempts to enable storage of large files by distributing the data among a pool of data nodes. A HDFS file appears to be one file, even though it blocks “chunks” of the file into pieces that are stored on individual data nodes. HDFS provides a level of fault-tolerance through data replication.

2) MapReduce
Used to write applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters (thousands of nodes). It is too fault-tolerant.

3) Yarn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the value proposition of HDFS from Hadoop?

A

1) Decrease cost of specialty large-scale storage systems
2) Providing the ability to rely on commodity components
3) Enabling the ability to deploy using cloud-based services
4) Reducing system management cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Describe the framework for MapReduce from Hadoop? And give and example of where MapReduce can be used?

A

Two steps:

1) Map: describes the computation or analysis applied to a set of input key/value pairs to produce a set of intermediate key/value pairs
2) Produce: the set of values associated with the intermediate key/value pairs output by the Map operation are combined to provide the results

  • MapReduce is a series of basic operations applied in a sequence to small chunks of many datasets
  • Combines both data and computational independence
  • It is fault-tolerant
  • Can be used to count number of occurrences of a word in a corpus (breaks down each document, each paragraph, each sentence slide 22 in L2).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is cloud computing?

A

Cloud computing refers to the place where data is being processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is data mining?

A

An art and science of discovering knowledge, insights and patterns in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why data mining?

A
  • To recognize of hidden value in data

- To effectively gather quality data and efficiently process it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Outline steps in a typical data mining process.

A

1) Understand the application domain
2) Identify data sources and select target data
3) Pre-process: cleaning, attribute selection
4) Data mining to extract patterns or models
5) Post-process: identifying interesting or useful patterns
6) Incorporate process in real world task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Name common mistakes around data mining.

A

1) Wrong problem for mining
2) Not having sufficient time for data acquisition
3) Only focus on aggregated results
4) Being sloppy with the procedure
5) Ignoring suspicious findings
6) Running mining algorithms repeatedly and blindly
7) Naively believing in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What method do we use for estimating a linear relationship statistically?

A

Ordinary Least Squares

OLS because it minimizes the sum of squared errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is pseudo out-of-sampling testing?

A

Break out the dataset in two and estimate the model based on one and then make predictions on the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do we compute beta in a single variable OLS?

How do we compute the intercept in a single variable OLS?

A
beta = COV[X,Y]/VAR[X]
intercept = plug as residual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the model assumptions behind OLS?

A

1) Linearity and additivity
2) Statistical independence of errors
3) Homoscedasticity: constant variance of the errors
4) Normality of the error distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How to diagnose violation of linearity?

A

Plot observed vs. predicted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How to diagnose violation of independence (concern in time series)?

A

Run Durbin Watson test on residuals and check if they exhibit autocorrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What statistical tests exist for assessing normality in OLS?

A

Shapiro-Wilk, Kolmogorov-Smirnov, Jarque-Bera, Anderson-Darling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Why do we need a Generalized Linear Model (GLM)? And what is its three components?

A

If the response variable of OLS does not follow a normal distribution (not Gaussian).

Composed of;

1) Randomness: associated with the dependent variable and its probability distribution
2) Systematic: identifies the selected covariates through a linear predictor
3) Link function: identifies the function E[Y] s.t. it is equal to the systematic component (e.g. log transformation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the logistic regression? (Mention nature of response variable and derive if needed)

A

Binary response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Mention key pros and cons of the panel regression model.

A

Pros:

  • suitable for studying dynamics
  • minimize bias
  • time fixed effects to control for unobserved variables

Cons:

  • data collection
  • unwanted correlation
  • complexity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Explain the difference between Random Effects and Fixed Effects Panel Regression.

A

FE: removes time-invariant characteristics
RE: variation across subjects is random and uncorrelated with the predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Types of classification of data?

A

Supervised Learning: train computer to learn new classification each time

Clustering (unsupervised): start from scratch every time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Mention five families of clustering methods.

A

1) Hierarchical methods (agglomerative and divisive)
2) Partitioning methods (non-hierarchical)
3) Fuzzy methods
4) Density-based methods
5) Model-based methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Mention metric types for describing distances between vectors.

A

1) Square root of sum of squared distance between components

2) Sum of abs. value difference between components

45
Q

What are the pros and cons of hierarchical methods for clustering?

A

Pros:

  • No apriori information needed about number of clusters required
  • Easy to implement
  • Very replicable

Cons:

  • Not very efficient O (radius)
  • Based on dissimilarity matrix (predetermined)
  • No objective function is minimized
  • The dendogram is not the best tool to choose the optimum number of clusters
  • Hard to treat non-convex shapes
46
Q

What are the pros and const of partitioning methods (non-hierarchical) for clustering?

A

Pros:

  • k-means is relatively efficient O (radius)
  • Easy to implement and understand
  • Totally replicable

Cons:

  • PAM does not scale well for large datasets
  • Applicable only when mean is defined
  • Need to specify k in advance
  • k-means is unable to handle noisy data and outliers
  • Not suitable to discover clusters with non-convex shapes
47
Q

Describe the two types of hierarchical clusters.

A

1) Agglomerative: each observation is considered as a cluster. Iteratively, the most similar clusters are merged until one single cluster forms (root)
2) Divisive: the inverse of agglomerative. It begins with a single root and the most heterogeneous clusters are divided until each observation form a cluster

48
Q

What are the most famous partitioning methods?

A

1) k-means: each cluster is represented by the center of the cluster
2) k-medoids (PAM): each cluster is represented by one of the points in that cluster
3) CLARA: suitable when large dataset are analyzed

49
Q

How can we test for cluster tendencies?

A

Hopkins test for spatial randomness of the data

50
Q

Mention three ways for computing the optimal number of clusters.

A

1) Silhouette
2) Gap-statistic
3) Within Sum of Squares

51
Q

What is the Naive Bayes approach to supervised learning?

A
  • Probabilistic machine learning algorithm
  • The term “naive” refers to the assumption that the features going into the model are independent
  • Very fast and scalable

Key concepts: 1) conditional probability, 2) bayes rule

52
Q

Why is it easier to “view” standardized residuals instead of regular ones?

A

Because then we can interpret them in terms of standard deviations and that 95% of them must lie within 95% for being normal (Gaussian)

53
Q

Do we have to perform variable selection manually in a multi-variate regression?

A

No we can run stepwise regressions and is based on minimizing AIC

54
Q

What is a quantile regression?

A

Whereas the method of least squares results in estimates of the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable

55
Q

What are the key ingredients of Model Performance?

A

1) Generalization: ability of a model to be applied to unused data
2) Overfitting: model is tailored to the training data at the expense of generalization to test data

56
Q

More complexity in a model means?

A

A high complex model is very accurate in training data
A low complex model is not very accurate

The more complex the model is, the more overfitting you get

57
Q

What is overfitting?

A

In statistics, overfitting is “the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably”

58
Q

Mention a holdout evaluation technique.

A

Cross-validation: performs multiple splits and systematically swaps out samples for testing

59
Q

General method to avoid overfitting?

A

Test data shall always be independent of model building

Some overfitting will always be present (complexity control)

60
Q

Specific method to avoid overfitting?

A

Regularization (penalization). Add a penalty function to the regression equation. Famous models are Ridge L2-norm and Lasso L1-norm

61
Q

How to assess (accuracy) model performance?

A

Confusion matrix. Contingency table of observed classes vs. predicted classes.

(Remember we need to count the errors of false positives and false negatives separately!)

62
Q

Describe the silhouette method for determining optimal cluster.

A

With the help of the average silhouette method, we can measure the quality of our clustering operation. With this, we can determine how well within the cluster is the data object. If we obtain a high average silhouette width, it means that we have good clustering. The average silhouette method calculates the mean of silhouette observations for different k values. With the optimal number of k clusters, one can maximize the average silhouette over significant values for k clusters.

63
Q

Describe the gap statistic method for determining optimal cluster.

A

In 2001, researchers at Stanford University. Tibshirani, G.Walther and T. Hastie published the Gap Statistic Method. We can use this method to any of the clustering method like K-means, hierarchical clustering etc.
Using the gap statistic, one can compare the total intracluster variation for different values of k along with their expected values under the null reference distribution of data. With the help of Monte Carlo simulations, one can produce the sample dataset. For each variable in the dataset, we can calculate the range between min(xi) and max (xj) through which we can produce values uniformly from interval lower bound to upper bound.

64
Q

Describe the within sum of squares (WSS) method for determining optimal cluster.

A

With the measurement of the total intra-cluster variation, one can evaluate the compactness of the clustering boundary. We can then proceed to define the optimal clusters as follows:

  1. we calculate the clustering algorithm for several values of k. This can be done by creating a variation within k from 1 to 10 clusters.
  2. We then calculate the total intra-cluster sum of square (iss).
  3. Then, we proceed to plot iss based on the number of k clusters.

This plot denotes the appropriate number of clusters required in our model. In the plot, the location of a bend or a knee is the indication of the optimum number of clusters.

65
Q

Mention algorithms for testing model accuracy.

A

1) LDA
2) KNN
3) SVM
4) CART
5) Random Forrest
6) Naive Bayes

66
Q

What is the definition of a coordinate system?

A

Combination of a set of position scales and their relative geometric arrangement.

It is translation invariant under linear transformation.

67
Q

Mention a non-linear coordinate system.

A

Logarithmic scale. Multiplication on a log scale looks like addition on a linear scale.

68
Q

What are polar coordinates?

A

Axes are curved
We specify positions via an angle and a radial distance from the origin

Useful for data of a periodic nature

69
Q

What are geospatial coordinates?

A

Locations on the globe
Degrees North and West

Use of Cartesian axes is misleading!

70
Q

Name charts for representing amounts.

A
  • Grouped bars

- Stacked Bars

71
Q

Name charts for representing distributions.

A
  • Boxplots
  • Violin plots
  • Jittered points
  • Sina plot
  • Stacked histograms
  • Overlapping densities
  • Ridgeline plot
72
Q

Name charts for representing proportions.

A
  • Mosaic plot
  • Tree map
  • Parallel sets
  • Multiple pie charts
  • Grouped bars
  • Stacked bars
  • Stacked densities
73
Q

Name charts for representing relationships.

A
  • Line graph
  • Connected scatter plot
  • Smooth line graph
  • Density contours
  • 2D bins
  • Hex bins
  • Correlogram
74
Q

Name charts for representing uncertainty.

A
  • Error bars
  • 2D error bars
  • Confidence band
  • Graded confidence band
75
Q

Mention differences between structure and unstructured data.

A

Structured data is comprised of clearly defined data types whose pattern make them easily searchable

Unstructured data is everything else.

76
Q

What is Natural Language Processing (NLP)?

A

Is the application of computational techniques to the analysis and synthesis of natural language and speech.
In NLP, data is referred to as corpus.

77
Q

What is deep learning?

A

Branch of Machine Learning that makes use of a specific type of architectures or models (like Neural Networks) to solve learning tasks

78
Q

What is the difference between Computational Linguistics and Natural Language Processing?

A

CL is a theoretical field that develops computational methods to answer scientific questions from the point of view of linguists

NLP is dedicated to give solutions to engineering problems related to natural language, focusing on people

79
Q

Describe a typical NLP workflow.

A

1) Collection of documents
2) Preprocess those documents such that we can do exploratory data analysis
3) Represent relevant features in some usable vector space
4) Apply a suitable model

80
Q

What are NLP challenges?

A

1) Ambiguity: choice of word vs. meaning of word
2) Synonymy
3) Syntax: structure of sentences
4) Coreference: e.g. concepts deduced implicitly
5) Normalization of words: break words to the core
6) Representation (transform words into vectors)
7) Style (e.g. irony and sarcasm)

81
Q

What is regex?

A

Regular expressions is a programming language to process textual data (cleaning it). Allows to search for patterns and manipulate them in textual data (clear out “fill” words).

82
Q

What is the Bag of Words NLP technique?

A

Simply count words’ frequency in a document

83
Q

What is keyness in NLP?

A

A measure associated with features that occur differently across different categories

84
Q

What is lexical dispersion in NLP?

A

An informative measure which communicates where the term has been used in the text (often displayed x-ray plots)

85
Q

Describe lemmatization and stemming.

A

Lemmatization: grouping of similar words (run base for running and ran) and consider these words equal

Stemming: takes a word and refers back to its base or root form. (for verbs take infinitiv)

86
Q

Mention ways to achieve syntactic parsing (analysis of structure of text for grammar).

A

1) Part-of-Speech Tagging (POS)
2) Dependency Parsing
3) Named Entity Recognition (NER)

87
Q

What ways exist to compute the tone of a text?

A

1) Lexicon based (e.g. pure words counting based on hash tables of positive and negative words)
2) Deep learning

88
Q

How can we detect hidden patterns in the corpus?

A

Using Topic Modelling (dimension reduction tool, i.e. reduces number of topics conferred about)

89
Q

How to determine optimal number of topics?

A

1) By perplexity (not rigourous)

2) By a chi-square test (rigourous)

90
Q

What is topic modelling?

A

A form of unsupervised learning whereby the topic categories of the text are unknown. By e.g. the LDA technique, it finds words that appear together and group them into topics. The researcher decides on the number of topics of the texts without prior information.

91
Q

What is the relationship between AI, ML, DL?

A

Artificial Intelligence contains Machine Learning contains Deep Learning

92
Q

What is AI?

A

Effort to automate intellectual tasks normally performed by humans

93
Q

What is Machine Learning?

A

Data and Answers –> Machine Learning –> Rules

It is trained rather than explicitly programmed
To do ML, we need three things;
1) Input data points (e.g. images)
2) Examples of the expected output (e.g. images of cats)
3) A way to measure performance of the algorithm (feedback loop = learning)

Based on this it uses statistics to determine what output is most likely the correct one and feeds on feedback

94
Q

What is Deep Learning?

A

A specific subfield of Machine Learning.

The learning process in DL is taken further to successively layers of increasingly meaningful representations. The layered representation is almost always learned through models called neural networks.

The number of layers that contribute to a model is called the depth.

DL is a multistage way to learn data representations

95
Q

What is a Neural Network?

A

A computing system made up of a number of simple, highly interconnected processing elements, which process information by using their dynamic state response to external inputs.

A mathematical framework for learning representation from data

96
Q

How do we define the number of weights in a two-layered NNET? How many biases? How many parameters?

A

Number of weights = sum across layers for each weight
Biases = Sum of number of hidden nodes and output nodes
Parameters = Sum of weights and biases

97
Q

What is the loss function in NNET?

A

Measures how far the output is from what we expect which is used as a feedback loop to readjust weights and next time yield an output closer to expectations (called the Backpropagation algorithm)

98
Q

Mention some DL achievements.

A
  • Near-human level image classification
  • Near-human level speech recognition
  • Near-human level handwriting transcription
  • Improved machine translation
  • Digital assistants
  • Near-human level autonomous driving
  • Ability to answer natural-language questions
99
Q

What is the purpose of the activation function in NNET?

A

To check the output value Y and decide whether other connections should consider this neuron as activated or not.

100
Q

Give a few examples on activation functions and their corresponding characteristics.

A

1) Stepwise function
2) Linear function
3) Sigmoid
4) Tanh
5) ReLu max(0,x)

101
Q

What is the role of bias in Neural Networks?

A

The bias node i NN is a node that is always on

Analogous to the intercept in a regression model

If a NNET does not have a bias node in a given layer, it will be able to produce an output in the next layer that differs from zero in a linear scale

102
Q

What type of NNET exist?

A

1) Feedforward Neural Networks (single and multi-layer perceptron)
2) Convolutional Neural Networks (Evolution of MLP with less computational needs)
3) Recurrent Neural Networks (Have feedback loops - Backpropagation, often applied to sequential tasks)

103
Q

What is the Feedforward Neural Network?

A

A feedforward neural network is an artificial neural network where connections between the units do not form a cycle. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

We can distinguish two types of feedforward neural networks: 1) Single-layer, 2) Multi-layer

104
Q

What is the Convolutional Neural Network?

A

Convolutional Neural Networks are very similar to ordinary Neural Networks, they are made up of neurons that have learnable weights and biases. In convolutional neural network (CNN, or ConvNet or shift invariant or space invariant) the unit connectivity pattern is inspired by the organization of the visual cortex, Units respond to stimuli in a restricted region of space known as the receptive field. Receptive fields partially overlap, over-covering the entire visual field. Unit response can be approximated mathematically by a convolution operation. They are variations of multilayer perceptrons that use minimal preprocessing. Their wide applications is in image and video recognition, recommender systems and natural language processing. CNNs requires large data to train on.

105
Q

What is the Recurrent Neural Networks?

A

In recurrent neural network (RNN), connections between units form a directed cycle (they propagate data forward, but also backwards, from later processing stages to earlier stages). This allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and other general sequence processors.

106
Q

What is Web Scraping?

A

Web scraping is a technique for converting the data present in unstructured format (HTML tags) over the web to the structured format which can easily be accessed and used. This means that you are going to build a data.frame or a corpus at the end of the day.

107
Q

Provide an overview of some approaches for supervised learning.

A
  • Heuristic
  • Model-based (Naive Bayes)
  • Binary decision
  • Optimisation based
108
Q

Name methods finding the best classifier?

A
  • Accuracy: % of correct predictions
  • AccuracyInf: a lowest bound on the mean
  • AccuracyCV: mean of accuracies by the resampling scheme
  • AccuracyPAC: highly probable bound on the accuracy obtained just by subtracting the standard deviation