Project Cycle- Data Acquisition Flashcards

1
Q

define data

A

. Data can be a piece of information or facts and statistics collected together
for reference or analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define training data

A

it is the primary dataset tht is fed into the system for the purpose of developing and training it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

give some examples where the machine takes in different trainng data

A
  • text catgorization-the input is asentence and the target tells the topic of the secntence.
  • image recognition-the input is an image which is analysed
  • sentiment analysis- The input is a sentence or a phrase from social media feeds like twitter, facebook or customer reviews from web sites or surveys
  • spam detection-where input is an email or text message which is analyzed as spam or not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the validating data set

A

Also called secondary data set

This data is used to check if the newly developed model is correctly identifying the data for making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does validating step ensure

A

This step makes sure that the new model has not become specific to the primary dataset values in making predictions.
If that is the case then corrections and tweaks are made in the project.
The primary and the secondary data sets are also re runs through the model untill the desired accuracy is achieved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define testing data

A

it is the final dataset which paves the way for the machine model to enter the real world and start making predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how does testing data differ from training and validating data

A

All primary and secondary data come with relevant label tags on the data
The testing data is the final dataset which provides no help in terms of tag to the model produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define datawarehousing

A

Data is always collected in bulk from various sources using various formats. The storing of this data is called data warehousing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define data features

A

data features are the factors and parameters that affect the problem directly or indirectly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what shud be the chracteritics of trainign data

A

For better efficiency of an AI project, the Training data needs to be relevant and authentic. Data plays an important part of the AI project as it creates the base on which the AI project is built. Therefore, the data acquired should be authentic, reliable and correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what should be the characteristics of our data sources

A

it is necessary
to find a reliable source of data from where some authentic information can be taken. At the same
time, we should keep in mind that the data which we collect is open-sourced and not someone’s
property. Extracting private data can be an offence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are most reliable and authentic data soucres

A

One of the most reliable and authentic sources of
information, are the open-sourced websites hosted by the government. These government portals
have general information collected in suitable format which can be downloaded and used wisely.
ex: data.gov.in, india.gov.in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

examples of some data sources

A
  • cameras
  • sensors
  • surveys
  • observations
  • web scraping
  • application program interface
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

define system map

A

it is a tool used to infer relationships between the different data features.

  • the data features are put in circles
  • the direction of the relationship is coneveryed by the directionof the arrowhead
  • the nature of the relationship is conveyed by the + or - sign. a ‘+’ sign indicates that two features are directly related, while a ‘-‘ sign indicates that two features are inversely related,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly