Automatic differentiation in C++17, had to do it from scratch as they weren’t open to 3rd-party libraries and it had to integrate with many symbolic calculations that are also done in the library (so even if you used a 3rd-party library you’d still have to rewrite parts of it to deal with the symbolic stuff). Succeeded in doing it and basically the performance was on par with what people in the industry would expect (all derivatives at once in around 3 times the evaluation time, sometimes much less if the calculation code has no dynamic parts and is differentiated entirely at compile-time).
It was pretty cool because it was a fun opportunity to really abuse template meta-programming and especially expression templates (you’re basically sort of building expression trees at compile-time), compile-time lazy evaluation, static polymorphism and lots of SFINAE dark magic and play around with custom memory allocators.
Then you get scolded by the CI guys because your template nightmare makes the build times 3x slower, so the project then also becomes an occasion to try out a bunch of tricks to speed up compilation.
Roughly one year. No GPU code however for that project as the target library is CPU-only anyway so not really comparable to PyTorch (and PyTorch is more than just the autodiff), but there was lots of SIMD vectorization. Yeah you could train a neural network on CPU with it if you want, and the expression template stuff I talked about would be somewhat equivalent to PyTorch’s operator fusion, but the target use is more quant finance code.
Automatic differentiation in C++17, had to do it from scratch as they weren’t open to 3rd-party libraries and it had to integrate with many symbolic calculations that are also done in the library (so even if you used a 3rd-party library you’d still have to rewrite parts of it to deal with the symbolic stuff). Succeeded in doing it and basically the performance was on par with what people in the industry would expect (all derivatives at once in around 3 times the evaluation time, sometimes much less if the calculation code has no dynamic parts and is differentiated entirely at compile-time).
It was pretty cool because it was a fun opportunity to really abuse template meta-programming and especially expression templates (you’re basically sort of building expression trees at compile-time), compile-time lazy evaluation, static polymorphism and lots of SFINAE dark magic and play around with custom memory allocators.
Then you get scolded by the CI guys because your template nightmare makes the build times 3x slower, so the project then also becomes an occasion to try out a bunch of tricks to speed up compilation.
Who cares about build times when execution speed is good?:
That’s insane. You’re basically talking about a custom implementation of pytorch autograd right? How long did it take?
Roughly one year. No GPU code however for that project as the target library is CPU-only anyway so not really comparable to PyTorch (and PyTorch is more than just the autodiff), but there was lots of SIMD vectorization. Yeah you could train a neural network on CPU with it if you want, and the expression template stuff I talked about would be somewhat equivalent to PyTorch’s operator fusion, but the target use is more quant finance code.