SMP afinity for threads and sockets

12 May 2004, 13:37 UTC

There is probably some value in assigning some processor or NUMA-node affinity to nfsd server threads and possibly to sockets as well

While the amount of state that an nfsd thread has is not large (a few pages) it is best if it doesn't bounce between processor caches too much. Tying a thread to a processor or group of processors is no big deal, but scheduling requests between them then becomes somewhat less trivial than the current last-in-first-out.

There is also the issue of processor afinity for sockets.

We currently only have one socket for all UDP traffic. This is already a problem on large machines. We need to be able to have multiple sockets. It does not seem particularly necessary for multiple udp sockets for recv, as the sockets is only locked for a very short time on receive. However the socket is locked for a while on send, so multiple sockets for sending would be useful.

Whether we should have one socket per cpu or one per thread for sending isn't clear. Maybe a small pool per CPU would be best.

When a request arrives, it needs to be passed to a thread on a less-busy CPU. It would be good if the scheduler could make the decission as to which CPU to use. We have a situation where any of number of threads will do, and we want to choose the one with a warm cache on an idle CPU.

It isn't clear that we can do better than the current last-in-first-out approach, leaving the scheduler to move threads between processors. As the scheduler tends to leave processes where they are while their cache is warm, simply keeping the one set of threads busy should be enough to keep them well spread out.




[æ]