Use pytorch2 optimized native attention by attesaarela · Pull Request #39 · Liuhong99/Sophia

attesaarela · 2023-07-20T19:12:54Z

Hi, here is a pull request for a small speedup where attention is computed using pytorch 2 function "torch.nn.functional.scaled_dot_product_attention" if available.

Makes the optimizer run about 10% faster according to a bit of testing I did

This optimization was essentially copied from a recent version of nanoGPT

…n essentially copied from recent nanoGPT version

attesaarela added 3 commits July 20, 2023 22:05

Use pytorch 2 native attention if available. Usage of native attentio…

5e2390c

…n essentially copied from recent nanoGPT version

Use pytorch 2 native attention if available. Usage of native attentio…

135a505

…n essentially copied from recent nanoGPT version

cleanup

2e90003

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pytorch2 optimized native attention#39

Use pytorch2 optimized native attention#39
attesaarela wants to merge 3 commits into
Liuhong99:mainfrom
attesaarela:use-pytorch2-optimized-native-attention

attesaarela commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

attesaarela commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant