Sarathi Flashcards
(2 cards)
1
Q
What percent of compute time during LLM inference is spent on attention?
A
5-10%
2
Q
Ffn_ln1 stands for
A
Feed forward network layer normalization 1
What percent of compute time during LLM inference is spent on attention?
5-10%
Ffn_ln1 stands for
Feed forward network layer normalization 1