64 thread limit?
#1
i am in the lovely position of testing tyflow on the new 3990x.  

however i note that i get only between 46 and 48% cpu utilisation..   is there a 64 thread limit or something?


edit: hmm.. unextpected (for me) behaviour.. testing the sand castle demo scene.

the scene as it opens uses around 48% cpu.

i thought "maybe if i load it up some more"

so i reduced voxel size.. creating many more particles.

cpu usage during sim now varies wildly, but hovers around 18% most of the time, with the odd peak up to 30% so its using much LESS cpu.

ive got 128gb decent ram and a pcie4 ssd.. so those shouldnt be a bottleneck?
  Reply
#2
The sand scenes, and all bind scenes in general, rely on OpenCL by default (GPU). Disable OpenCL everywhere and you'll see your CPU usage go up. Beyond that, tyFlow relies on Windows thread scheduling, which I believe has issues past 64 cores.
  Reply
#3
I have this "issue" too, its is a shame not beeing able to use all the 64threads... while rendering Vray and other stuff I can get fully usage of the 100% is that fixable in some way?
  Reply
#4
"100% CPU usage" is not a good metric for performance. tyFlow uses a dynamic thread scheduler that assigns the optimal number of threads to a task, so you'll see overall usage go up or down depending on what it's calculating. You won't see all 64 used until you start working with many millions of particles.

If you can make a flow with 10 million particles and usage does not go up to 100% during long calculations, let me know.
  Reply
#5
        Well I thought it was strange that it used exactly the 50% of the threads at 100% usage. and the rest was just around 5%.

Is true that depending on the simulation I'm doing sometimes just uses 5 to 10 cores but that as you said could be that just doesn't need more.
But when i first test it was exactly the 50% that's why was souspicious....

Well I tested again as you suggested, I tried different ways and I always get the same result.   from 1m to 10m  the usage was exactly the same 50% of the threads only at full power.
I tried to duplicate the events, to 20m to see if would grab some of the other thtreads with no luck.
I understand might be a complicated issue, it is not so critical to me either, I will just enjoy my espresso between simulations Smile   just wanted to report it.



-----------

I guess the problem has to do with the numa cores one of them is at 100% while the other is calmly resting.

image attached.
  Reply
#6
Ah yea it seems like a NUMA issue.

If you go to the Debugging rollout of a tyFlow object, there is a "Print NUMA Info" button you can press, that will print all detected CPU info to the MAXScript listener. See if that reports 64 detected CPUs or not...

If it prints out 64 CPUs detected, then it looks like it's a windows thread scheduling issue. I create threads for each CPU on a machine, and assign (up to) that many simultaneous tasks, however windows does the actual scheduling itself.

It's possible that if windows is throttling CPUs like that, that you'll get worse performance in tyFlow because of it (ie, because tyFlow may create 64 threads, but they're spread across only 32 cores). Try loading up a really slow simulation that uses all 32 cores, and in the tyFlow main settings set max threads to 32 and see if it runs any faster....

Also check the CPU affinity of 3dsmax.exe in the windows task manager and make sure it's assigned to all 64 cores.
  Reply
#7
same issue here. Actualy when i primt numa info it will print just numa1 all 64 threads but not numa2 at all. Also it seems that v-ray using all 128 threads just fine and 3dsmax bifrost is also using for simulatipn all 128 threads. Thank you
  Reply
#8
yes, exactly I also see that.
I've been asking around ( not so much people with 3990x still ) but some others with Intel also have same problem. everything that passes 64threads is problematic.
Looks like Vray or Bifrost uses all the cores without problem.
Well I understand we are a minority and is not a priority to investigate this so, well I guess we just wait until more than 64threads is mainstream hehe Smile

I'll check the things that you suggested to get most of my 64threads! thankyou.
  Reply
#9
Ok I tested what Tyson suggested:

Assign only 32 cores instead of leaving that in auto, is actually faster.
instead of running in 64threads it runs only in 32 but the time got big improovement.
Its the " google drive benchmark for tyflow" and the simulation ( OpenCL off) took

in Auto mode 118 sec.
asigned to 32cores only. 97 sec

Interesting!

Btw is impossible to assign in the Task manager 64cores / 128threads to 3dsmax, they are separated in two groups, and If I activate one group the other goes off. :S
so the best you can do is 32c 32threads in one group.
  Reply


Forum Jump: