Diagnosis: Slow Windows logins and roaming profiles

Speed limit sign

We were recently asked to diagnose a networking problem: the customer has a group of boarding houses that are connected back to the main school network through a microwave link. They were reporting that Windows logins worked fine at the school, but if they took a laptop to the boarding houses they were left waiting upwards of a minute to log in.

The microwave link is relatively fast - we measured a throughput of around 90Mbps, but the round trip time (latency) was around 8-10ms. This latency really isn't terrible, but it is around 100 times what you'd see on most wired networks. There are a number of causes for a relatively high latency. Wireless links like to bundle multiple packets together to make more efficient use of the bandwidth. Connections that are expected to experience bursts of interference often use a technique called "interleaving": the payload data and error correction data are temporally separated to reduce the chance of a burst of interference knocking out the whole packet. You see much higher latency out on the internet, and most software is designed to cope with this so you wouldn't expect 8-10ms to be a big problem.

Our testing immediately pinned the problem down to Windows' roaming profiles. Roaming profiles date back to 1993 and haven't changed much since. The concept is simple: when a user logs on, all of their files (of which there may be thousands taking up hundreds of megabytes) are copied from the file server onto the user's workstation. As you can imagine, this causes all kinds of usability problems, one of the big ones being that you have to wait for all of this data to be copied after you log on before you can actually use the computer. Microsoft suggest that waiting 3-5 minutes to log on is reasonable. I'm not sure how many people would agree with this; As a non-Windows user I generally expect the time between entering my password and firing up my applications to be a single-digit number of seconds.

Rather than copying every file over each time you log on, Windows compares the files on the workstation with those on the server and copies just the ones that have changed. Unfortunately, the Windows roaming profile system seems to have been designed in a very "lazy" or naive way. The conversation between the workstation and server is roughly as follows:

Workstation: Give me information about file "A".
Server: Here is the information about file "A".
Workstation: Give me information about file "B".
Server: Here is the information about file "B".

Each time the workstation asks the server about a file, it waits for the response before moving onto the next one - a response that will take 8-10 milliseconds to arrive. On a profile with hundreds or thousands of files, all of these 8-10ms waits add up to a significant amount of time. An increase in latency causes a proportional increase in the total amount of time the logon will take, so you could expect a logon at the boarding houses to take around 100× as long as one at the school - 60 seconds at the boarding houses is going to be very noticable whereas 0.6 seconds at the school is over in the blink of an eye.

Usually protocol designers try to avoid these kinds of delays by using a technique known as pipelining whereby the client would just keep asking the server requests irrespective of whether it has yet received a response to the previous requests. A pipelined system is slightly more complex since the client has to match up each response with the appropriate request, and this is why I've characterised the roaming profile system as a "lazy" implementation - Microsoft have unfortunately chosen the simple approach rather than the fast approach.

Ironically, the link has plenty of bandwidth and a profile consisting of a few large files will be copied across very quickly (I measured around 90Mbps); it is profiles consisting of many files which are the problem.

Our recommendation was to limit the number of files in a profile as much as possible - use folder redirection where possible to ensure that files go directly to the server instead of into the roaming profile. Examining the settings of the microwave link may also prove useful - it may be possible to reduce the latency by changing the error correction settings, etc.