Performance shouldn't be slower with a newer card...for example I experienced a nearly linear speedup (matching the increase in compute cores) in the bind solver when I moved from a 1080ti to a 3090ti (this was on a giant scene that really maxed out the 1080's capabilities - the 'Starry Night' clip on my Instagram page), and all the extra VRAM was a huge bonus as well...but I don't have a 4090 to test the difference between that an a 3090.
But many of these things are very scene specific. For example, PhysX can use CUDA, but even NVidia states in their documentation that CUDA won't surpass CPU performance until around 10k rigidbodies are in the scene.
tyFlow is similar...the bind solver, Particle Physics, CCCS, all utilize CUDA/OpenCL, but there is a particle count threshold below which CPU calculations may still be faster, depending on hardware. If I pass the bindings of 100 particles to the GPU that will be much slower than passing 100 to the CPU....but if I pass a million, then it will be way faster on the GPU.
You need a setup with a very large number of particles/bindings/etc to really do a proper performance benchmark between chips.
But many of these things are very scene specific. For example, PhysX can use CUDA, but even NVidia states in their documentation that CUDA won't surpass CPU performance until around 10k rigidbodies are in the scene.
tyFlow is similar...the bind solver, Particle Physics, CCCS, all utilize CUDA/OpenCL, but there is a particle count threshold below which CPU calculations may still be faster, depending on hardware. If I pass the bindings of 100 particles to the GPU that will be much slower than passing 100 to the CPU....but if I pass a million, then it will be way faster on the GPU.
You need a setup with a very large number of particles/bindings/etc to really do a proper performance benchmark between chips.