A friend of mine recently tried to visit research.microsoft.com, he was unable to.
He spent some time on it, and came up with the following scenario:
I could reproduce this myself, for my setup research.microsoft.com did not work too, but I felt I would not loose too much anyway.
After some time he provided some pcap dumps for all possible combinations and scenarios, so I felt guilty and finally gave it a shot.
So what is wrong with research.microsoft.com?
I decided to replace my friends address with 1.2.3.4.
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [S], seq 634583584, win 5840, options [mss 1452,sackOK,TS val 52325345 ecr 0,nop,wscale 7], length 0
IP 131.107.65.14.80 > 1.2.3.4.15475: Flags [S.], seq 650205845, ack 634583585, win 8192, length 0
IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
IP 1.2.3.4.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [S], seq 634583584, win 5840, options [mss 1460,sackOK,TS val 52325345 ecr 0,nop,wscale 7], length 0
IP 131.107.65.14.80 > 192.168.0.126.15475: Flags [S.], seq 650205845, ack 634583585, win 8192, options [mss 1452], length 0
IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
IP 192.168.0.126.15475 > 131.107.65.14.80: Flags [P.], seq 1:808, ack 1, win 5840, length 807
What happens:
the client sends his SYN, providing his mss of 1460
the server ACKs
the router adds an MSS of 1452 when sending the data to the client
the client ACKs
the client tries to transfer 807 bytes
the client retries to transfer 807 bytes
the client retries multiple times, before giving up, sending RST (not in dump)
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [S], seq 1960348643, win 8192, options [mss 1452,nop,wscale 2,nop,nop,sackOK], length 0
IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [S.], seq 4293367034, ack 1960348644, win 8192, length 0
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [.], ack 1, win 65340, length 0
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 65340, length 536
IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [.], ack 537, win 65392, length 0
IP 1.2.3.4.49276 > 131.107.65.14.80: Flags [P.], seq 537:756, ack 1, win 65340, length 219
IP 131.107.65.14.80 > 1.2.3.4.49276: Flags [P.], seq 1:176, ack 756, win 65173, length 175
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [S], seq 1960348643, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [S.], seq 4293367034, ack 1960348644, win 8192, options [mss 1452], length 0
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [.], ack 1, win 65340, length 0
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 1:756, ack 1, win 65340, length 755
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 65340, length 536
IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [.], ack 537, win 65392, length 0
IP 192.168.0.126.49276 > 131.107.65.14.80: Flags [P.], seq 537:756, ack 1, win 65340, length 219
IP 131.107.65.14.80 > 192.168.0.126.49276: Flags [P.], seq 1:176, ack 756, win 65173, length 175
What happens:
the client sends his SYN, providing his mss of 1460
the server ACKs
the router adds an MSS of 1452 when sending the data to the client
the client ACKs
the client tries to transfer 755 bytes
the client retries to transfer 755 bytes
the client retries to transfer 755 bytes
the client tries to transfer 536 bytes
the server ACKs
the client tries to transfer 219 bytes
the server ACKs
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu
IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [S], seq 1266255433, win 5840, options [mss 1460,sackOK,TS val 51818908 ecr 0,nop,wscale 7], length 0
IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [S.], seq 18729607, ack 1266255434, win 8192, length 0
IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 5840, length 536
IP 1.2.3.4.52946 > 131.107.65.14.80: Flags [P.], seq 537:834, ack 1, win 5840, length 297
IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [.], ack 834, win 65392, length 0
IP 131.107.65.14.80 > 1.2.3.4.52946: Flags [P.], seq 1:186, ack 834, win 65392, length 185
IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [S], seq 2215431791, win 5840, options [mss 1460,sackOK,TS val 52152068 ecr 0,nop,wscale 7], length 0
IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [S.], seq 2060919804, ack 2215431792, win 8192, length 0
IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [.], ack 1, win 5840, length 0
IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [.], seq 1:537, ack 1, win 5840, length 536
IP 192.168.0.126.15470 > 131.107.65.14.80: Flags [P.], seq 537:829, ack 1, win 5840, length 292
IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [.], ack 829, win 65392, length 0
IP 131.107.65.14.80 > 192.168.0.126.15470: Flags [P.], seq 1:177, ack 829, win 65392, length 176
Unfortunately the sequence numbers of these capture do not really match, the do not belong to the same session, but thats not a problem:
the client sends his SYN, providing his mss of 1460
the server ACKs - there is no MSS sent to the client
the client defaults to the minimum MSS of 536 as specified by
rfc879
the client tries to transfer 536 bytes
the server ACKs
the client tries to transfer 219 bytes
the server ACKs
Using
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
will add a pmtu clamped MSS to packets, even if there was no MSS before, making communication impossible if the servers relies on a lower MSS and does not send or drops icmp fragmentation needed packets.
Windows (I was told >2000) will retry communication with a the minimum MSS of 536 bytes, which makes it work even with bad configured hosts and firewalls, thats why it works for the Windows client with pmtu clamping.
A MSS range bound pmtu clamping like:
iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 1400:1536 -j TCPMSS --clamp-mss-to-pmtu
will only set an MSS if there was an MSS before. It will not modify messages without MSS, so the client will default to 536 bytes and commumicate without problems.
The linux kernel got some slightly contradicting points on the topic too:
So, if you have the problem, use MSS range bound pmtu clamping - if you can, as many people won't be able to fix their linux based hardware routers without a firmware upgrade from their vendor.
For Microsoft, I really wonder what makes them prefer a 536 bytes MSS, given the 40 bytes ip/tcp header overhead for a 536 bytes tcp payload at least ~7%, compared to ~2.7% for a MSS of 1452.
If thats Turning Ideas into Reality - this is a bad idea, and they should get over it.
Another thing they should get over is dropping icmp (fragmentation-needed?, research.microsoft.com drops all icmp) packets, if they'd allow icmp, the client could detect the error, and retry with a lower MSS, without having the client to change the MTU without any indication as current Microsoft Windows operating systems do.
And, due to the obscure nature of the problem, there is even a really entertaining debian bug.
And yes, the topic is misleading, research.microsoft.com does not communicate any MSS, therefore research.microsoft.com - MSS of 536 bytes is wrong.
[…] his analysis of the problem […]