Hallo
does anybody have experience with an RTX 4090 and tyflow.
We did one little test and the cuda simulation time was doubled from the same with a 1080ti.
i cannot believe this...
Regards H.Kroeber
Have you installed the very latest CUDA DLLs? They're required for 4090 cards (the older DLLs will not work with 4090s).
Perhaps CUDA was disabled due to the 4090 incompatibility with the old DLLs and you just didn't notice that the sim was running on the CPU?
You can open the tyProfiler (editor > utilities menu) and then run your sim to see where the bottleneck is.
03-29-2023, 01:47 PM
(This post was last modified: 03-29-2023, 01:48 PM by BeatriceBaker.)
First, you should keep in mind that the RTX 4090 is a fairly new graphics card, which was released only this year. So there may be optimization problems for some programs and plugins. Secondly, tyflow is quite a complex tool which may consume a lot of resources. You might need to run more thorough tests and adjust the settings to get the best performance.
03-31-2023, 01:28 PM
(This post was last modified: 03-31-2023, 01:29 PM by tyFlow.)
Performance shouldn't be slower with a newer card...for example I experienced a nearly linear speedup (matching the increase in compute cores) in the bind solver when I moved from a 1080ti to a 3090ti (this was on a giant scene that really maxed out the 1080's capabilities - the 'Starry Night' clip on my Instagram page), and all the extra VRAM was a huge bonus as well...but I don't have a 4090 to test the difference between that an a 3090.
But many of these things are very scene specific. For example, PhysX can use CUDA, but even NVidia states in their documentation that CUDA won't surpass CPU performance until around 10k rigidbodies are in the scene.
tyFlow is similar...the bind solver, Particle Physics, CCCS, all utilize CUDA/OpenCL, but there is a particle count threshold below which CPU calculations may still be faster, depending on hardware. If I pass the bindings of 100 particles to the GPU that will be much slower than passing 100 to the CPU....but if I pass a million, then it will be way faster on the GPU.
You need a setup with a very large number of particles/bindings/etc to really do a proper performance benchmark between chips.