Identify common types of computer vision solution Flashcards

Question 1

Q

Computer Vision

Answer

A

The goal of computer vision is often to extract meaning, or at least actionable insights, from images; which requires the creation of machine learning models that are trained to recognize features based on large volumes of existing images.

Convolutional neural networks (CNNs) = Use filters to extract numeric feature maps from images, and then feed the feature values into a deep learning model to generate a label prediction.

Question 2

Q

Transformers

Answer

A

Is a type of neural network architecture.

The transformer architecture is known for its ability to capture long-range dependencies in data and its parallelizable structure, making it highly efficient for training on large datasets.

Question 3

Q

Multi-modal

Answer

A

Multi-modal models combine information from different modalities, such as text, images, and audio, to enhance performance in tasks that involve multiple types of data.

Question 4

Q

Microsoft Florence model

Answer

A

Is a pre-trained general model on which you can build multiple adaptive models for specialist tasks.

-It includes both a language encoder and an image encoder.

-Image classification: Identifying to which category an image belongs.
-Object detection: Locating individual objects within an image.
-Captioning: Generating appropriate descriptions of images.
-Tagging: Compiling a list of relevant text tags for an image.

Question 5

Q

Azure AI Vision

Answer

A

Microsoft’s Azure AI Vision service provides prebuilt and customizable computer vision models that are based on the Florence foundation model and provide various powerful capabilities.

With Azure AI Vision, you can create sophisticated computer vision solutions quickly and easily; taking advantage of “off-the-shelf” functionality for many common computer vision scenarios, while retaining the ability to create custom models using your own images.

Azure AI Vision supports multiple image analysis capabilities, including:

-Optical character recognition (OCR) - extracting text from images.
-Generating captions and descriptions of images.
-Detection of thousands of common objects in images.
-Tagging visual features in images

Question 6

Q

Image Classification

Answer

A

An image classification model is used to predict the category, or class of an image. For example, you could train a model to determine which type of fruit is shown in an image

Question 7

Q

Object Detection

Answer

A

Object detection models detect and classify objects in an image, returning bounding box coordinates to locate each object.

Question 8

Q

Face Detection, Analysis & Recognition

Answer

A

Face Detection involves identifying regions of an image that contain a human face, typically by returning bounding box coordinates that form a rectangle around the face.

With Face Analysis, facial features can be used to train machine learning models to return other information, such as facial features such as nose, eyes, eyebrows, lips, and others.

Facial Recognition is a further application of facial analysis to train a machine learning model to identify known individuals from their facial features.

Question 9

Q

Face Analysis on Azure

Answer

A

-Azure AI Vision, which offers face detection and some basic face analysis, such as returning the bounding box coordinates around an image.

-Azure AI Video Indexer, which you can use to detect and identify faces in a video.

-Azure AI Face, which offers pre-built algorithms that can detect, recognize, and analyze faces.

The Limited Access policy requires customers to submit an intake form to access additional Azure AI Face service capabilities including:

-The ability to compare faces for similarity.
-The ability to identify named individuals in an image.

–The Face resource has face detections capabilities, and can be used in Vision Studio to understand its capabilities.
–The locations of detected faces are indicated by coordinates for a rectangular bounding box

Question 10

Q

Optical Character Recognition (OCR)

Answer

A

The foundation of processing text in images is optical character recognition (OCR), in which a model can be trained to recognize individual shapes as letters, numerals, punctuation, or other elements of text.

-The ability to extract text from images is handled by Azure AI Vision service
-You can think of the Read API as an OCR engine that powers text extraction from images, PDFs, and TIFF files.
-Results arranged in pages, lines, and words

Brainscape's Knowledge GenomeTM

Identify common types of computer vision solution Flashcards

Brainscape's Knowledge Genome^TM