Sarathi Flashcards

1
Q

What percent of compute time during LLM inference is spent on attention?

A

5-10%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ffn_ln1 stands for

A

Feed forward network layer normalization 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly