research.microsoft.com - MSS of 536 bytes

A friend of mine recently tried to visit research.microsoft.com, he was unable to. He spent some time on it, and came up with the following scenario:

  • linux router with pmtu clamping
    • linux client does not work
    • windows client works
  • linux router with conditional pmtu clamping (if the mss is within 1400:1536)
    • linux client works
    • windows client works

I could reproduce this myself, for my setup research.microsoft.com did not work too, but I felt I would not loose too much anyway.

After some time he provided some pcap dumps for all possible combinations and scenarios, so I felt guilty and finally gave it a shot.

So what is wrong with research.microsoft.com?

pcap

I decided to replace my friends address with 1.2.3.4.

pmtu clamping

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu

linux client

router
  1. IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [S], seq 634583584, win 5840, options [mss 1452,sackOK,TS val 52325345 ecr 0,nop,wscale 7], length 0
  2. IP 131.107.65.14.80 > 1.2.3.4.15475: Flags [S.], seq 650205845, ack 634583585, win 8192, length 0
  3. IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
  4. IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
  5. IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
client
  1. IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [S], seq 634583584, win 5840, options [mss 1460,sackOK,TS val 52325345 ecr 0,nop,wscale 7], length 0
  2. IP 131.107.65.14.80 > 192.168.0.126.15475: Flags [S.], seq 650205845, ack 634583585, win 8192, options [mss 1452], length 0
  3. IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
  4. IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
  5. IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807

What happens:

  1. the client sends his SYN, providing his mss of 1460
  2. the server ACKs
    1. the router adds an MSS of 1452 when sending the data to the client
  3. the client ACKs
  4. the client tries to transfer 807 bytes
  5. the client retries to transfer 807 bytes
  6. the client retries multiple times, before giving up, sending RST (not in dump)

windows client

router
  1. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [S], seq 1960348643, win 8192, options [mss 1452,nop,wscale 2,nop,nop,sackOK], length 0
  2. IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [S.], seq 4293367034, ack 1960348644, win 8192, length 0
  3. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [.], ack 1, win 65340, length 0
  4. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  5. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  6. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  7. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 65340, length 536
  8. IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [.], ack 537, win 65392, length 0
  9. IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 537:756, ack 1, win 65340, length 219
  10. IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [P.], seq 1:176, ack 756, win 65173, length 175
client
  1. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [S], seq 1960348643, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
  2. IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [S.], seq 4293367034, ack 1960348644, win 8192, options [mss 1452], length 0
  3. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [.], ack 1, win 65340, length 0
  4. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  5. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  6. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
  7. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 65340, length 536
  8. IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [.], ack 537, win 65392, length 0
  9. IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 537:756, ack 1, win 65340, length 219
  10. IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [P.], seq 1:176, ack 756, win 65173, length 175

What happens:

  1. the client sends his SYN, providing his mss of 1460
  2. the server ACKs
    1. the router adds an MSS of 1452 when sending the data to the client
  3. the client ACKs
  4. the client tries to transfer 755 bytes
  5. the client retries to transfer 755 bytes
  6. the client retries to transfer 755 bytes
  7. the client tries to transfer 536 bytes
  8. the server ACKs
  9. the client tries to transfer 219 bytes
  10. the server ACKs

range pmtu clamping

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu

linux client

router
  1. IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [S], seq 1266255433, win 5840, options [mss 1460,sackOK,TS val 51818908 ecr 0,nop,wscale 7], length 0
  2. IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [S.], seq 18729607, ack 1266255434, win 8192, length 0
  3. IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
  4. IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 5840, length 536
  5. IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [P.], seq 537:834, ack 1, win 5840, length 297
  6. IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [.], ack 834, win 65392, length 0
  7. IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [P.], seq 1:186, ack 834, win 65392, length 185
client
  1. IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [S], seq 2215431791, win 5840, options [mss 1460,sackOK,TS val 52152068 ecr 0,nop,wscale 7], length 0
  2. IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [S.], seq 2060919804, ack 2215431792, win 8192, length 0
  3. IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
  4. IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 5840, length 536
  5. IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [P.], seq 537:829, ack 1, win 5840, length 292
  6. IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [.], ack 829, win 65392, length 0
  7. IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [P.], seq 1:177, ack 829, win 65392, length 176

Unfortunately the sequence numbers of these capture do not really match, the do not belong to the same session, but thats not a problem:

  1. the client sends his SYN, providing his mss of 1460
  2. the server ACKs - there is no MSS sent to the client
  3. the client defaults to the minimum MSS of 536 as specified by rfc879
  4. the client tries to transfer 536 bytes
  5. the server ACKs
  6. the client tries to transfer 219 bytes
  7. the server ACKs

solution

Using

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu

will add a pmtu clamped MSS to packets, even if there was no MSS before, making communication impossible if the servers relies on a lower MSS and does not send or drops icmp fragmentation needed packets. Windows (I was told >2000) will retry communication with a the minimum MSS of 536 bytes, which makes it work even with bad configured hosts and firewalls, thats why it works for the Windows client with pmtu clamping.

A MSS range bound pmtu clamping like:

iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu

will only set an MSS if there was an MSS before. It will not modify messages without MSS, so the client will default to 536 bytes and commumicate without problems.

The linux kernel got some slightly contradicting points on the topic too:

/* Never increase MSS, even when setting it, as
 * doing so results in problems for hosts that rely
 * on MSS being set correctly.
 */

net/netfilter/xt_TCPMSS.c:93

/*
 * MSS Option not found ?! add it..
 */

net/netfilter/xt_TCPMSS.c:116

So, if you have the problem, use MSS range bound pmtu clamping - if you can, as many people won't be able to fix their linux based hardware routers without a firmware upgrade from their vendor.

For Microsoft, I really wonder what makes them prefer a 536 bytes MSS, given the 40 bytes ip/tcp header overhead for a 536 bytes tcp payload at least ~7%, compared to ~2.7% for a MSS of 1452.
If thats Turning Ideas into Reality - this is a bad idea, and they should get over it.
Another thing they should get over is dropping icmp (fragmentation-needed?, research.microsoft.com drops all icmp) packets, if they'd allow icmp, the client could detect the error, and retry with a lower MSS, without having the client to change the MTU without any indication as current Microsoft Windows operating systems do.

And, due to the obscure nature of the problem, there is even a really entertaining debian bug.

And yes, the topic is misleading, research.microsoft.com does not communicate any MSS, therefore research.microsoft.com - MSS of 536 bytes is wrong.

Comments

1

[…] his analysis of the problem […]

2011/01/05 00:38
2

Meine Firma betreibt Webanwendungen (völlig unzusammenhängend: Donnerstag gibt es Kuchen, mit einem E-Voting im Intranet, wo die Mitarbeiter sich die Kuchensorten wünschen können, die sie am liebsten haben. Und vor den meisten Nachtschichten gibt es Steak

2011/04/17 14:50


2010/07/01/research.microsoft.com_-_mss_of_536_bytes.txt · Last modified: 2010/07/01 14:52 by common
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0