AI Creativity Flashcards

AI Creativity Case Study (74 cards)

1
Q

What are the three types of art described in the taxonomy of computer-generated art?

A

C-art, G-art, and CG-art

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is C-art?

A

Art that uses computers as part of the art-making process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is G-art?

A

Art that is generated, at least in part, by some process not under the artist’s direct control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is CG-art?

A

Art produced by leaving a computer program to run by itself, with minimal or zero human interference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are passive tools in the context of generative systems?

A

Tools that make no attempt to alter the user’s input, e.g., Microsoft Paint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are active tools in the context of generative systems?

A

Tools that actively interpret and process user inputs, adding things to it, e.g., the sketch pad mentioned in the lecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a first-order Markov model in text generation?

A

A model where the next state depends only on the current state (word)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a second-order Markov model in text generation?

A

A model where the next state depends on the two previous states (words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three parts of the example AI system for generating pop music?

A
  1. Lyric generator using GPT-2 transformer model, 2. Music generator using Music-VAE auto-encoder model, 3. Singing voice synthesis using the DiffSinger model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the basic process for generating text using a Markov model?

A
  1. Pick a random initial state, 2. Select from possible next states, 3. If no possible next state, go back to step 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name three examples of more complex generative text models.

A

Variable order Markov model, Long short-term memory network (LSTM), Transformer network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some features to consider when describing a generative system?

A

System architecture, Number of agents, Roles, Environment, Corpus, Input, Output, Communication, Human interactive modality, Task, Evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a second-order Markov model differ from a first-order model?

A

It considers the two previous states (words) instead of just the current state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main difference between more complex generative models and simple Markov models?

A

They have a more complex method for picking the next state/output, including more complex state representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the context of generative systems, what is a corpus?

A

The collection of data (e.g., text, images, music) that the system uses to learn and generate new content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main advantage that transformers add to language models?

A

They add context awareness to embeddings, allowing for a combination of contextual and sequential data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does self-attention work in transformer networks?

A

It creates a new type of embedding that incorporates information about other words in the context, not just the current word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two main ways of encoding input in recurrent neural networks?

A
  1. One-hot encoded vectors, 2. Embeddings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the ‘bag of words’ approach good for, and what is its limitation?

A

It’s good for sentiment analysis, but not great for generating text as it ignores sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When was the transformer architecture first reported?

A

2017

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two main components that transformers model?

A

Sequence and context via self-attention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How many parameters does GPT-2 have?

A

1.5 billion parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How many layers does GPT-2 have?

A

48 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is meant by ‘zero-shot’ concept in relation to GPT-2?

A

GPT-2 can perform tasks it wasn’t specifically trained for, outperforming some specialized models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Huggingface?
A platform on a mission to democratize good machine learning, providing tools and models for NLP
26
Why was GPT-2 initially considered 'too dangerous to release'?
Due to concerns about potential malicious applications of the technology
27
What is the 'auto-regressive mode' of GPT-2?
It can generate an endless stream of words based on previous output
28
How much text data was GPT-2 trained on?
40G of text
29
What are 'attention heads' in the context of transformers?
Multiple projections of attention, allowing the model to focus on different aspects of the input simultaneously
30
What is the main advantage of using pre-trained models like those from Huggingface?
They allow for quick implementation and fine-tuning of state-of-the-art language models without training from scratch
31
What is a latent space in the context of machine learning?
A compact way of describing a dataset, representing the learned statistical structure through a reversible dimension reduction technique
32
How does a variational autoencoder (VAE) differ from a standard autoencoder in terms of latent space representation?
VAEs encode to parameters of a statistical distribution (mean and variance), while standard autoencoders encode to a vector
33
What is the main advantage of using a VAE over a standard autoencoder for generative tasks?
VAEs force all areas of the latent space to be meaningfully decodable, making them more useful for generative exploration
34
What does the 'recurrent' part in Music-VAE refer to?
It uses recurrent neural network components, specifically LSTM (Long Short-Term Memory) units
35
How much training data was used for Music-VAE?
1.5 Million MIDI files from the web
36
What is the dimension of the latent vector in Music-VAE?
256 or 512 dimensions
37
What are the three main components of Music-VAE's architecture?
Encoder (Bidirectional LSTM), Latent Space, and Hierarchical Decoder
38
How does Music-VAE represent musical sequences?
It encodes 2 or 16 bar musical sequences including pitches, durations, and timing, potentially with multiple parts (e.g., bass, melody, drums)
39
What is the 'unrelated fragments problem' in music generation?
When sampling from latent space, generated sequences may not have coherent long-term structure
40
How can one explore the latent space of a trained Music-VAE model?
By sampling random vectors, permuting existing vectors, or making subtle changes to vectors in the latent space
41
What is self-supervised learning in the context of Music-VAE?
The model is trained to reproduce its inputs via encoding and decoding, without requiring separate labeled data
42
Why are VAEs particularly useful for creative applications?
They allow for smooth interpolation between points in the latent space, enabling the generation of new, coherent samples
43
What is the process of 'permuting the sample' in latent space exploration?
Generating a latent vector, saving it, reading it back, and then making alterations to explore variations
44
How does Music-VAE handle multiple instrument tracks?
It can encode and decode sequences with multiple parts, such as bass, melody, and drums
45
What is the advantage of using a hierarchical decoder in Music-VAE?
It ensures better use of the latent representation, allowing for more coherent long-term structure in generated sequences
46
What are the three primary challenges in singing voice synthesis?
1. Timing, 2. Expressiveness, 3. Holding long notes
47
What was the basis for HAL 9000's singing voice in '2001: A Space Odyssey'?
A tube model of the vocal tract, arranged by Joan Miller and Max Mathews
48
What is 'Pink Trombone' in the context of voice synthesis?
A physical model of the vocal tract for speech synthesis
49
What technique was used in the 'Speak and Spell' toy from the 1970s?
LPC (Linear Predictive Coding)
50
What was a significant development in voice synthesis during the 1990s?
Realtime synthesis and the use of HMM (Hidden Markov Model) systems
51
Name three recent deep learning-based singing voice synthesis systems.
1. XiaoiceSing (2020), 2. HiFiSinger (2020), 3. DiffSinger (2021)
52
What are the main components of the DiffSinger system?
Language Model, Speech Model, and Vocoder Model
53
What is the purpose of the Language Model in DiffSinger?
To convert text input into phonetic representations
54
What does the Speech Model in DiffSinger produce?
Mel (spectral) features
55
What is the role of the Vocoder in DiffSinger?
To convert the Mel features into actual sound waves
56
What dataset was used in the LJSpeech model mentioned in the notes?
A public domain speech dataset of 13,100 short audio clips from a single speaker reading non-fiction books
57
What are Mel features in the context of speech synthesis?
Spectral frames or tiny slices of audio that have been transformed, similar to an embedding
58
Name three key terms used in modern speech synthesis systems.
Embedding, Convolutional, Transformer
59
What was a significant development in voice synthesis during the 2000s?
Vocaloid and concatenative spectral synthesis
60
How do deep learning models in the 2010s-20s differ from earlier voice synthesis approaches?
They model sound from the audio level, not using symbolic models
61
What are the main steps in putting together a complete AI music generation system according to the lecture notes?
1. Generate lyrics, 2. Generate music, 3. Extract melody, 4. Sing, 5. Mix the singing with the backing
62
What tool is mentioned for generating lyrics in the AI music system?
GPT-2
63
What Python library is used to extract melody from a MIDI file?
mido
64
What is a challenge in extracting melody from a MIDI file?
There is no standard MIDI channel for melody, unlike percussion which is always channel 10
65
How is timing handled when extracting notes from a MIDI file?
By accumulating the time values of each message and printing the cumulative time with each note
66
What format does the singing synthesis program expect for note input?
A list of notes with repetitions indicating duration, e.g. 'c,g,eee'
67
Name three tools mentioned for mixing the singing with the backing track.
Reaper, Audacity, and librosa
68
Who created Reaper, and what other famous software did they create?
Justin Frankel, who also created Winamp
69
What is Audacity?
A free and open-source audio editor
70
What are the basic steps for mixing audio using librosa?
1. Load the files, 2. Normalize/set levels, 3. Pad the shorter track with zeros, 4. Add the tracks together, 5. Write to disk
71
What file format is mentioned for the final mixed output?
WAV (mix.wav)
72
What popular chord progression is mentioned as an example for music generation?
D, A, Bm, G (referred to as a popular progression used in '4 chords' by Axis of Awesome)
73
How is the quality of the generated melody assessed in the described system?
By listening to it and deciding whether to proceed or keep sampling (note: the lecture mentions this could potentially be automated)
74
What Python script is mentioned for converting MIDI to a note list?
midi_to_note_list.py