The tribal knowledge seems to be that you shouldn't do TCP-based hole punching because it's harder than UDP. The author acknowledges this:
> You can do NAT traversal with TCP, but it adds another layer of complexity to an already quite complex problem, and may even require kernel customizations depending on how deep you want to go.
However, I only see marginally added complexity (given the already complex UDP flows). IMO this complexity doesn't justify discarding TCP hole punching altogether. In the article you could replace raw UDP packets to initiate a connection with TCP SYN packets plus support for "simultaneous open" [0].
This is especially true if networks block UDP traffic which is also acknowledged:
> For example, we’ve observed that the UC Berkeley guest Wi-Fi blocks all outbound UDP except for DNS traffic.
My point is that many articles gloss over TCP hole punching with the excuse of being harder than UDP while I would argue that it's almost equally feasible with marginal added complexity.
The existence of stateful firewalls, and the fact that most NAT filters are EDF rather than EIF means that simultaneous open (send) is necessary even for UDP.
Hence the added complexity of doing a simultaneous open via TCP is fairly minor. The main complication is communicating the public mapping, and coordinating the "simultaneous" punch/open. However that is generally needed for UDP anyway...
One possible added complexity with TCP is one has to perform real connect() calls, rather than fake up the TCP SYN packet. That is becase some firewalls pay attention to the sequence numbers.
Yeah, I've gotten somewhat annoyed by the name of 'NAT traversal' for these methods. It seems to make some people think that cutting out NAT will lead to a beautiful world of universal P2P connections. But really, these methods are needed for traversing between any two networks behind stateful firewalls, which will pose a barrier to P2P indefinitely.
Also, wouldn't it be easier for stateful firewalls to block simultaneous TCP open (intentionally or not)? With UDP, the sender's firewall must create a connection as soon as it sends off the first packet, even if that packet bounces off the other firewall: the timing doesn't have to be particularly tight. But with TCP, the firewall might plausibly wait until the handshake is complete before allowing incoming packets, and it might only allow the 3-way SYN/SYN-ACK/ACK instead of the simultaneous SYN/SYN/ACK/ACK.
> But really, these methods are needed for traversing between any two networks behind stateful firewalls, which will pose a barrier to P2P indefinitely.
That's true. The actual problem are symmetric NATs where every peer sees a different port number. This makes traditional NAT-traversal impossible and you have to resort to port guessing/scanning. See for example https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
People honestly thought for a while that devices behind a NAT were secured unless ports were specifically routed to them, hence the term "nat hole punching" was coined.
You're probably right that it makes less sense from today's perspective
Doesn't make sense. NAT hole punching requires you to execute on the target inside the NAT.
If you are able to do that whatever security you got from NAT has been breached even before NAT hole punching enters the conversation.
NAT will block unsolicited incoming connections, that is a great boon for security but obviously not a silver bullet for all network related security issues nor outgoing connections. That has never been a trope.
> Doesn't make sense. NAT hole punching requires you to execute on the target inside the NAT.
Why doesn't it make sense to you? From my perspective the idea was that the NAT protects your devices - and your device is now punching a hole into this protection, making it vulnerable to the world wide web
This circumventing doesn't have to be done by a malicious actor, it just comes at the added risk of becoming "targetable" from the Internet
Because it is the same thing as opening an outgoing connection but with more steps. The only thing it allows is to connect to someone else that is also behind NAT.
By this logic, a firewall has no bearing on security either. It just drops packets / makes devices unadressable unless a route has been allowed/ a port has been opened
I think this is a really good point. As someone who has implemented TCP hole punching myself and now has a very good implementation for it I will say that obviously a major benefit of using TCP is you don't have to subsequently roll a poorman's TCP on-top of UDP once the hole is open. The other issue with TCP hole punching though is it looks very similar to a SYN flood compared to UDP packets. This may mean lower success rates for some networks. Though in practice I haven't seen much filtering so far.
TCP hole punching is very fun. The way I do it is to use multiple NTP readings to compute a "clock skew" -- how far off the system clock is from NTP. Then the initiator sets a future meeting time that is relative to NTP. It honestly gets quite accurate. It even works for TCP hole punching between sockets on the same interface which is crazy if you think about it.
The reason I wanted to support this strange, local-based punching mode is if it works that efficiently to be able to succeed in host-based punching then likely it will be fast enough to work on the LAN and Internet, too. My code is Python and my very first attempt at this was eye opening to say the least. Due to how timing-sensitive TCP hole punching is I was having failures from using Python with old-school self-managed sockets. I was using threading and a poormans event loop (based on my C socket experience)... which is ah... just not the way to do it in Python.
The only way I could get that code to work was to ensure the Python process had a high priority so other processes on the system didn't deprioritize it and introduce lag between the punching attempts. That is how time-critical the code is (with an inefficient implementation.) My current implementation now uses a process pool that each has its own event loop to manage punching. I create a list of tasks that are distributed over time. Each task simply opens a connection that is reused from the same socket. I determined this code was the best approach (in Python anyway) after testing it on every major OS.
You are right about TCP and UDP hole punching difficulty being similar. The main difficulty to both is the NAT prediction step. I haven't written code yet for symmetric NAT bypass but I am starting to see how I'd integrate it (or possibly write a new plugin for it.)
I did just think of another drawback for TCP vs UDP punching that I think puts a major point in UDP's favour. It may have been touched on others already. But TCP would require the router to record connection state. This is bad because the table for routers is very small and some of these punching techniques are quite aggressive. Like the algorithm that tries to bypass symmetric NATs. If you're opening hundreds of TCP connections its possible you might even DoS the router. For UDP its plausible optimizations for state management would make it less likely that your punching would render the whole router inoperable. This is only speculation though.
> If you're opening hundreds of TCP connections its possible you might even DoS the router.
This was sometimes an issue for underpowered home/SOHO routers in the mid-2000s, but most modern routers have enough memory to support decently sized connection-tracking tables.
In any case, both TCP and UDP require connection tracking; there's no inherent advantage to UDP.
A read a bit about this space a few weeks ago after not knowing anything about it beforehand. My impression is that ip6 dices all of this and NAT traversal isn't necessary anymore. So why isn't ip6 more popular and how do I get started with it for my home network and tailscale VPN?
Fascinatingly effective, but maybe I'm the only one getting the heebie-jeebies when someone suggests implementing this in production corp networks. Sure it's super convenient, but the thought of bypassing all traditional NATs and firewalls, and instead relying solely on a software ACL, seems super risky. Maybe I just don't understand how it works, but it seems that a bad actor getting access to a stray VM with Tailscale on it in, say, your AWS testing env, essentially has an clear path all the way into your laptop on the internal corp network, through the kernel, into user space and into the Tailscale ACL code as the sole arbiter of granting or blocking access. Would I even know someone unauthorized made it that far?
That is why many of us keep repeating that NAT is not a security mechanism.
Punching through NAT, and most associated state tracking filters, is very easy.
I've implemented such in a production corp environment, as a product to be sold. There is no magic here, it is all well understood technology by the practitioners.
If you actually want to have packet filtering (a firewall) then deploy a firewall instance distinct from any NAT, and with appropriate rules. However that only really helps for traffic volume reduction, the actual security gain from a f/w per se is now minimal, as most attacks are over the top: HTTP/HTTPS, POP/IMAP etc.
> That is why many of us keep repeating that NAT is not a security mechanism.
You can say that in general, network firewalls are not a security mechanism. They are at most a means to prevent brute-force attacks from outside of the network.
to be completely fair with you, everyone misinterprets NAT as a security mechanism, because traditionally it is deployed alongside a stateful firewall.
In reality, of course, the stateful firewall is doing all of the heavy lifting that NAT is getting the credit for. Tailscale does not get rid of the firewall in fact it has a much more comprehensive setup based on proper ACLs.
Though I’m definitely the first to admit that their tooling around ACL’s could be significantly improved
I think they mostly interpret NAT as a security mechanism because that's what it originally was; "NAT" was a species of firewall, alongside "stateful" and "application layer". And NAT obviously does serve a security purpose; just not the inside->out access control function we're talking about here.
> think they mostly interpret NAT as a security mechanism because that's what it originally was; "NAT" was a species of firewall
That’s simply wrong. NAT is, and always has been for the sole purpose of Network Address Translation, I.e. allowing a large IP address space to hide behind a much smaller IP address space (usually a single IP address), for the purpose of mitigating IP address exhaustion.
NATs were meant to be a stop gap solution between IPv4 running out, and the rollout of IPv6. But we all know how that panned out.
The “firewall” like aspects of a NAT are purely incidental. The only reason why a NAT “blocks” unsolicited inbound traffic is because it literally has no idea where to send that traffic, and /dev/null is the only sensible place to direct what’s effectively noise from the NATs perspective.
The fact that NATs shares many of basic building blocks as a very simple stateful firewall is just a consequence of both NATs and firewalls being nothing more than stateful packet routing devices. The same way any standard network switch is (they internally keep a mapping of IP to MAC address of connected devices based of ARP packets, which incidentally blocks certain types of address spoofing, but nobody calls a network switch a firewall).
You're trying to piece this together axiomatically, but you can just read the history of the Cisco PIX firewall to see that the story is not as simple as you want it to be. One of the first and clearly the most popular NAT middlebox products of the 1990s was a firewall, and Cisco made a whole big deal about how powerful NAT was as a security feature.
You’re working backwards here from Cisco’s marketing material. Just because someone in Cisco’s marketing team was smart enough to realise they could market NAT as a security feature, doesn’t mean it was designed to be a firewall.
Apple advertises their iPads as “computer replacements”, that doesn’t mean the iPad was originally designed to be a computer replacement, and it certainly doesn’t make iPads a good computer replacement for many people.
I would also highlight that Cisco PIX had a dedicated firewall layer in addition to its NAT layer, which provided much more capabilities than the NAT layer alone. The fact that these two layers intelligently built on each other is just good implementation engineering, it doesn’t change the fundamental fact that NAT isn’t, and never has been, a proper security tool.
I'm working forwards from my experience at the time as a security engineer working with products that claimed NAT was a security feature, since it allowed for machines to access the Internet without being routable from the Internet for initiated connections, which is why the first commercial PIX product, after Cisco bought Network Translation (which named PIX), was a firewall.
People confuse the fact that NAT is not an especially powerful or coherent security feature with the idea that it isn't a security feature, which leads you to the crazy rhetorical position of having to argue that PIX, the first mainstream NAT product, was not a security product. I have friends who worked on PIX, for many years. I assure you: they were in the Security BU.
I think this position is pretty hopeless, though if you want to drag us around through the network security marketing of the mid-1990s, I'm happy to do so, just for nostalgia's sake. NAT is absolutely a security feature, and was originally deployed as one, in an era where it was still feasible to get routable allocations for individual workstations.
> NAT was a security feature, since it allowed for machines to access the Internet without being routable from the Internet for initiated connections
I'm sure you also know, that any stateful firewall can achieve the same result without having to provide NAT capabilities. Sure Cisco PIX may have been a security appliance, but that doesn't make NAT's a firewall. You don't need Network Address Translation to create a firewall that allows devices to connect to the internet, but makes those machines unrouteable to unsolicited requests. For your claim that NATs are meant to be a firewall, you need to provide an explanation as to why we don't use NATs with IPv6.
Why would increasing the IP address space so that it's once again possible to get routable allocation for indivual workstations, result in people not deploying IPv6 NATs, when apparently they're an important security tool for IPv4, in even in the days when "it was still feasible to get routable allocations for individual workstations"?
> The same way any standard network switch is (they internally keep a mapping of IP to MAC address of connected devices based of ARP packets, which incidentally blocks certain types of address spoofing, but nobody calls a network switch a firewall).
I thought standard network switches kept a mapping of MAC address to physical network ports, and didn't concern themselves with the IP layer at all (other than things like IGMP/MLD snooping)? Mapping from IP to MAC addresses is a function of hosts/gateways, not switches.
I mean, it really isn’t a security mechanism of any kind. Any security properties at all are completely accidental.
One need only disable stateful firewalling and use that to see how completely dire the situation would be. As all outbound connections open up your host to the internet.
Networking has long been the toxic wasteland of security and misconfiguration. Now combine that with newer host-based networking models for containers. The Windows network stack is substantially different now due to that, and more complex. Since Wireguard has been part of Linux, everyone and their brother now has a VPN, somewhere connecting to a VPS. It's probably worse than you think because you don't know what you don't know.
This is to go through NAT, which are devices designed to work around the rarefaction of IPv4 addresses.
Firewalling is a different concept, but since you raise that issue of connectivity wrt. security, I have to say that what makes /me/ sad and anxious is to see how internet security has always been hinging on bloquing paquets based on the destination port.
Doing what's easy rather than what's correct, exemplified and labelled "professional solutions"...
That’s how all voip worked since forever and we also have a bunch of standard and public facing infrastructure to make it easier. All the ice, turn and friends.
It still needs something on the inside to talk to outside first, so the actual firewall should whitelist both outbound and inbound connections.
Than again, if you rely on perimeter, it’s a matter of time when someone figures out what’s your equivalent of high wiz jacket is.
It's no different from traditional VPNs. The tailnet admin has control over the routes that are exposed to clients and ACLs are available to further limit access. It's an overlay network, it doesn't magically give you access to user space on people's laptops.
Given how tailscale works and many of the features (the SSH features especially) it's not terribly hard to imagine a critical flaw or misconfigured setup giving access to userspace
Everything beyond tailscales core VPN features are opt-in. The risk of misconfiguring Tailscale is the same (arguably it’s much smaller) as just misconfiguring SSH on a machine.
At the end of the day, Tailscale works just like any other VPN, from the perspective of the type of data that can traverse between machines connected to the same virtual network. Tailscales use of a P2P wireguard mesh is just an implementation detail, it’s no more or less risky that having every machine connect to a central VPN gateway, and bouncing all their traffic off that. Either way, all the machines get access to a virtual network shared by every other machine, and misconfigured ACLs could result in stuff getting exposed between those machines, which shouldn’t be exposed.
If anything the Tailscale mesh model is much more secure, because it forces every device to use a true zero trust model. Rather than the outdated, “oh it managed to connect to the VPN, it must be safe then” model that traditional VPNs often end up implementing.
I'm not sure how to compare the risk and attack surface of traditional NATs and firewalls vs Tailscale's ACL code, but I'm not sure Tailscale is obviously the riskier choice there. I think more traditional network devices are more familiar and more of a known quantity, but there's a lot of janky, unpatched, legacy network devices out there without any of the security protections of modern operating systems and code.
It's also worth considering that exploitability of ACL code is just one factor in comparing the risk and Tailscale or similar solutions allow security conscious setups that are not possible (or at least much more difficult) otherwise. For example, the NAT and firewall traversal means you don't have to open any ports anywhere to offer a service within your Tailscale network. Done correctly, this means very little attack surface for a bad actor to gain access to that stray VM in the first place. You can also implement fairly complex ACL behavior that's effectively done on each endpoint without having to trust your network infrastructure at all, behavior that stays the same even if your laptop or other devices roam from network to network.
Not to say I believe Tailsclae is bulletproof or anything, but it does offer some interesting tradeoffs and it's not immediately obvious to me the risk is worse than legacy networks (arguably better), and you gain a lot of interesting features and convenience.
Network security is a myth. NATS, firewalls, ACLs, etc don't keep you safe. Even on your Wifi LAN right now, you aren't safe from local network attacks originating from outside attackers.
Because hackers can contort themselves into amazing shapes in order to fit through tiny holes in the oddest places. Once they position themselves correctly, and are able to reach the network address and port of a given service, and it has no authentication, it's open season. It may seem difficult, nigh impossible, for a hacker to reach all the way into your WiFi LAN. But there are always twists and turns to take.
From the public internet: tens of thousands of internet routers have publicly known exploits right now, which the router vendors refuse to fix. Just scan the internet for the routers, use your exploit, and you're inside.
From the opposite direction: malware in a website can redirect your browser to the management interface of a router on your local LAN, where it can reconfigure your router. If there is a password but you have logged in from your browser, the active session token lets it right in, and CSRF protection is often disabled or incorrectly set up. And even if it has a password, many such routers have exploits that will work despite a password. Many people also fall for phishing attacks that can drop payloads on your machine directly.
In some cases, the ISP itself has shipped a firmware update to routers that included malware.
All of these things have happened in the past 2 years, to millions of internet users, that we know of. Many large attacks go unnoticed for years. Once the router is compromised, it can be configured to forward ports or enable UPnP, or simply persist malware inside the router itself. The network is wide open and at the attacker's fingertips.
And this is just one class of attack. There are many more that can attack private networks. So there is no place safe from network attacks. Not in a corporate network, not on your local LAN, nowhere. There is no network security. The only network services that can be somewhat trusted are ones which require strong authentication, authorization, and encryption.
A better question is, “why do you think your local network is safe?”.
Have you taken steps to validate the integrity of every single device connected to the network?
If a single device is compromised, how will detect its been compromised?
If a device is compromised, what prevents it from being used to launch an attack on other devices in your network, especially if your security model assumes that all devices on your local network are “safe”?
For a more boring everyday equivalent, just search around for one of the many botnets that are assembled from compromised SoHo routers, or IoT devices, around the world.
Assuming a local network is safe and secure is foolish. There’s nothing inherently secure about a local network, the only reason it offers any level of security is due to a local network being many-many orders of magnitude smaller than the entire internet. So the probability of a hostile device (whether intentional installed as hostile, or became hostile after a remote attack) being connected is smaller. But at the end of the day, is security via “being luckier than the next dude”.
As far as I understand, Tailscale won't even let you initiate a connection (or give you WireGuard keys for a node) unless there's an ACL rule that allows it.
By using an unpatched rce in any network exposed code. The whole point of firewall is to prevent bad hackers from the bad internet to exploit your unpatched rces, abuse your default passwords, host based security you shouldn't have had in the first place and access stuff using compromised credentials you didn't revoke or didn't know you should have revoked. Because consistently doing all of that all of the time is hard for creative professionals. It's a chore. It's a tax.
How exactly is Tailscale different to literally any other piece of network capable software in that regard?
NAT traversal requires careful coordination between two devices to create a connection, it’s not like any random device on the internet can perform NAT traversal against a machine just because it’s running Tailscale (not to mention every modern browser has NAT traversal built in for WebRTC connections).
And if the issue doesn’t arise from using NAT traversal, then how does Tailscale expose anything more significant than what a traditional VPN will expose? After all the only difference between a P2P VPN and a traditional VPN, is that a traditional VPN bounces all your traffic off a common server, rather than attempting P2P connections.
I think the point is not that there are necessarily exploits, but by compromising one node in the tailnet they now have the ability to hit code in these locations, or services running on your tun0 interface on your laptop etc.
How is compromising a a single node in a tailnet more dangerous than compromising a single node in a traditional VPN?
Traditional VPN don’t usually put firewalls between machines on the network, because traditionally the whole point of a VPN is to avoid the need for firewalls to provide security between nodes on the virtual network, by assuming that only safe machines can connect to the VPN.
You would typically remove the default any to any ACL rule, and allow the connections that you need. The compromised node normally would not have access to anything interesting. Normally it’s jailed, or would not be able to make outgoing connections.
The ACL logic happens in the tailscaled on the destination though doesn't it? So even if you block the access via the ACL the packet has still gone through the network stack and go runtime etc before the traffic is dropped which is a significantly bigger surface than a (traditional) external network firewall.
I see your point. You are talking about a vulnerability.
You are right. Tailscale nodes can send packets processed in any other node, irrespectively of ACLs. Essentially each node gets to “run code” in other nodes, which is normally dropped. I don’t know how deep the Tailscale packets go before being dropped (perhaps the coordination server distributes firewall rules).
But you have to compare with another access method, like, the hub and spoke VPN. The compromised and uncompromised nodes connect to a VPN access server. A compromised node sends packets that are processed in the VPN server, but can also connect to the uncompromised node, meaning, the latter has to process and drop the packets of the former. You have to trust the OS IP stack. To some extent, the same is true if the trusted node VPNs directly to the untrusted node. During an established connection, the networking stack of the trusted node has to block the other side.
Maybe someone familiar with the implementation of ACLs in Tailscale could chime in.
Update: The ACL rules are applied to the incoming packets over tailscale interface. The filtering is then done by tailscaled. The packet has gone past the interface and processes by tailscaled. So an unauthenticated packet indeed travels through the kernel space all the way to the userspace.
How is this any different to any other piece of network capable software that’s listening to a port on your machine?
An external network firewall can only offer protection if you can somehow guarantee that every packet that hits a specific node is first routed via that firewall. Traditionally nobody has setup networks like that, because it requires routing every single packet via a single common bottleneck, causing huge latency and throughput problems.
As for packets going via the network stack, and then the go runtime. Do you honestly believe there’s set of vulnerabilities out there which would allow random external packets to be sent to a random machine, and cause an RCE by virtue of simply being process by the OS kernel, which somehow can only be exploited if you’re running Tailscale? Better still, if such a vulnerability exists, what on earth makes you think your firewall isn’t also vulnerable to same issue, given that pretty much every firewall out there is built on the Linux kernel these days.
I wish there was a tailscale-like equivalent without connectivity encryption, for devices which encrypt at the application layer (like almost the entire internet does). We don't always need the lower layers to be encrypted, this is especially computationally expensive for low power devices (think IoT stuff running a tailscale like tunnel).
GRE tunnels exist and I actually use them extensively, but UDP hole punching is not handled so hub-and-spoke architecture is needed for them, no peer to peer meshes with GRE (ip fou).
Are there equivalent libraries out there which do UDP hole punching and unencrypted GRE tunnels following an encrypted handshake to confirm identity?
Yes, the established standard here is known collectively as Interactive Connectivity Establishment (ICE) [1] which WebRTC relies on -- there are a few good libraries out there that implement it and/or various elements of it [2] [3].
libp2p [4] may be what you're after if you want something geared more towards general purpose connectivity.
FWIW, libp2p also enforces transport encryption, quote:
> Encryption is an important part of communicating on the libp2p network. Every connection must be encrypted to help ensure security for everyone. As such, Connection Encryption (Crypto) is a required component of libp2p.
It's written in Python. Though its not based on using the default interface like most networking code. I wanted the possibility to be able to run services across whatever interfaces you like. Allowing for much more diverse and useful things to be built. Its mostly based on standard library modules. I hate C extension crap as it always breaks packages cross-platform.
> So, to traverse these multiple stateful firewalls, we need to share some information to get underway: the peers have to know in advance the ip:port their counterpart is using.
> [...] To move beyond that, we built a coordination server to keep the ip:port information synchronized
This is where I wish SIP lived up to its name (Session Initiation Protocol, i.e. any session, such as a VPN one...) and wasn't such a complicated mess making it not worth the hassle. I mean it was made to be the communication side-channel used for establishing p2p rtp streams.
(p.s. your links weren't clickable because lines that are indented with 2 or more spaces get formatted as code - see https://news.ycombinator.com/formatdoc)
Interesting blast from the past. We built an oblivious p2p mesh network that did this in 2010. Back then, nobody cared about security as much as we thought they should. Since then, nobody still cares about security as much as they should. Devices have increased and their value has increased, and still, they are quite insecure. Truly secure endpoints with hardware root-of-trust and secure chains of trust for authn/authz and minimal temporary privileges is still hard, and network perimeter security theater is still ongoing in home networks, corp networks and even large production datacenter networks. Only reason we don't find these to be the primary root-cause for security breaches is because more easier attack chains are still easily available!
IMO: this is arguably one of the most detailed articles on NAT traversal on the entire Internet. But it is missing information on delta behaviors (its not that complex -- just that some NATs have observable patterns in how they choose to assign successive external ports. The most common one is simply preserving the source port. But there can also be others, e.g. an increment of the former mapping.)
It's a very good theoretical article. I wonder to what extent a software engineer could use this though. Because although it does describe many things I'm not sure there's enough detail to write algorithms for it. Like, could an engineer wrote an algorithm to test for different types of NATs on the basis of this article? Could they adapt their own hole punching code? I've personally read papers where simple tables were more useful than entire articles like this (as extensive as it is.) Maybe still a good starting point though.
Also, the last section in the article is extremely relevant. It has the potential to bypass symmetric NATs which are used in the mobile system. The latest research on NAT traversal uses similar techniques and claims near 100% success rates.
Another year, another repost of an article about NAT traversal, another couple replies about how this is insecure followed by people explaining that NAT is not a security feature.
This is an excellent article!
The tribal knowledge seems to be that you shouldn't do TCP-based hole punching because it's harder than UDP. The author acknowledges this:
> You can do NAT traversal with TCP, but it adds another layer of complexity to an already quite complex problem, and may even require kernel customizations depending on how deep you want to go.
However, I only see marginally added complexity (given the already complex UDP flows). IMO this complexity doesn't justify discarding TCP hole punching altogether. In the article you could replace raw UDP packets to initiate a connection with TCP SYN packets plus support for "simultaneous open" [0].
This is especially true if networks block UDP traffic which is also acknowledged:
> For example, we’ve observed that the UC Berkeley guest Wi-Fi blocks all outbound UDP except for DNS traffic.
My point is that many articles gloss over TCP hole punching with the excuse of being harder than UDP while I would argue that it's almost equally feasible with marginal added complexity.
[0] https://ttcplinux.sourceforge.net/documents/one/tcpstate/tcp...
The existence of stateful firewalls, and the fact that most NAT filters are EDF rather than EIF means that simultaneous open (send) is necessary even for UDP.
Hence the added complexity of doing a simultaneous open via TCP is fairly minor. The main complication is communicating the public mapping, and coordinating the "simultaneous" punch/open. However that is generally needed for UDP anyway...
One possible added complexity with TCP is one has to perform real connect() calls, rather than fake up the TCP SYN packet. That is becase some firewalls pay attention to the sequence numbers.
Yeah, I've gotten somewhat annoyed by the name of 'NAT traversal' for these methods. It seems to make some people think that cutting out NAT will lead to a beautiful world of universal P2P connections. But really, these methods are needed for traversing between any two networks behind stateful firewalls, which will pose a barrier to P2P indefinitely.
Also, wouldn't it be easier for stateful firewalls to block simultaneous TCP open (intentionally or not)? With UDP, the sender's firewall must create a connection as soon as it sends off the first packet, even if that packet bounces off the other firewall: the timing doesn't have to be particularly tight. But with TCP, the firewall might plausibly wait until the handshake is complete before allowing incoming packets, and it might only allow the 3-way SYN/SYN-ACK/ACK instead of the simultaneous SYN/SYN/ACK/ACK.
> But really, these methods are needed for traversing between any two networks behind stateful firewalls, which will pose a barrier to P2P indefinitely.
That's true. The actual problem are symmetric NATs where every peer sees a different port number. This makes traditional NAT-traversal impossible and you have to resort to port guessing/scanning. See for example https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
People honestly thought for a while that devices behind a NAT were secured unless ports were specifically routed to them, hence the term "nat hole punching" was coined.
You're probably right that it makes less sense from today's perspective
Doesn't make sense. NAT hole punching requires you to execute on the target inside the NAT.
If you are able to do that whatever security you got from NAT has been breached even before NAT hole punching enters the conversation.
NAT will block unsolicited incoming connections, that is a great boon for security but obviously not a silver bullet for all network related security issues nor outgoing connections. That has never been a trope.
> Doesn't make sense. NAT hole punching requires you to execute on the target inside the NAT.
Why doesn't it make sense to you? From my perspective the idea was that the NAT protects your devices - and your device is now punching a hole into this protection, making it vulnerable to the world wide web
This circumventing doesn't have to be done by a malicious actor, it just comes at the added risk of becoming "targetable" from the Internet
Because it is the same thing as opening an outgoing connection but with more steps. The only thing it allows is to connect to someone else that is also behind NAT.
It has no bearing on security.
By this logic, a firewall has no bearing on security either. It just drops packets / makes devices unadressable unless a route has been allowed/ a port has been opened
I think this is a really good point. As someone who has implemented TCP hole punching myself and now has a very good implementation for it I will say that obviously a major benefit of using TCP is you don't have to subsequently roll a poorman's TCP on-top of UDP once the hole is open. The other issue with TCP hole punching though is it looks very similar to a SYN flood compared to UDP packets. This may mean lower success rates for some networks. Though in practice I haven't seen much filtering so far.
TCP hole punching is very fun. The way I do it is to use multiple NTP readings to compute a "clock skew" -- how far off the system clock is from NTP. Then the initiator sets a future meeting time that is relative to NTP. It honestly gets quite accurate. It even works for TCP hole punching between sockets on the same interface which is crazy if you think about it.
The reason I wanted to support this strange, local-based punching mode is if it works that efficiently to be able to succeed in host-based punching then likely it will be fast enough to work on the LAN and Internet, too. My code is Python and my very first attempt at this was eye opening to say the least. Due to how timing-sensitive TCP hole punching is I was having failures from using Python with old-school self-managed sockets. I was using threading and a poormans event loop (based on my C socket experience)... which is ah... just not the way to do it in Python.
The only way I could get that code to work was to ensure the Python process had a high priority so other processes on the system didn't deprioritize it and introduce lag between the punching attempts. That is how time-critical the code is (with an inefficient implementation.) My current implementation now uses a process pool that each has its own event loop to manage punching. I create a list of tasks that are distributed over time. Each task simply opens a connection that is reused from the same socket. I determined this code was the best approach (in Python anyway) after testing it on every major OS.
You are right about TCP and UDP hole punching difficulty being similar. The main difficulty to both is the NAT prediction step. I haven't written code yet for symmetric NAT bypass but I am starting to see how I'd integrate it (or possibly write a new plugin for it.)
I did just think of another drawback for TCP vs UDP punching that I think puts a major point in UDP's favour. It may have been touched on others already. But TCP would require the router to record connection state. This is bad because the table for routers is very small and some of these punching techniques are quite aggressive. Like the algorithm that tries to bypass symmetric NATs. If you're opening hundreds of TCP connections its possible you might even DoS the router. For UDP its plausible optimizations for state management would make it less likely that your punching would render the whole router inoperable. This is only speculation though.
> If you're opening hundreds of TCP connections its possible you might even DoS the router.
This was sometimes an issue for underpowered home/SOHO routers in the mid-2000s, but most modern routers have enough memory to support decently sized connection-tracking tables.
In any case, both TCP and UDP require connection tracking; there's no inherent advantage to UDP.
A bit OT:
A read a bit about this space a few weeks ago after not knowing anything about it beforehand. My impression is that ip6 dices all of this and NAT traversal isn't necessary anymore. So why isn't ip6 more popular and how do I get started with it for my home network and tailscale VPN?
Fascinatingly effective, but maybe I'm the only one getting the heebie-jeebies when someone suggests implementing this in production corp networks. Sure it's super convenient, but the thought of bypassing all traditional NATs and firewalls, and instead relying solely on a software ACL, seems super risky. Maybe I just don't understand how it works, but it seems that a bad actor getting access to a stray VM with Tailscale on it in, say, your AWS testing env, essentially has an clear path all the way into your laptop on the internal corp network, through the kernel, into user space and into the Tailscale ACL code as the sole arbiter of granting or blocking access. Would I even know someone unauthorized made it that far?
That is why many of us keep repeating that NAT is not a security mechanism.
Punching through NAT, and most associated state tracking filters, is very easy.
I've implemented such in a production corp environment, as a product to be sold. There is no magic here, it is all well understood technology by the practitioners.
If you actually want to have packet filtering (a firewall) then deploy a firewall instance distinct from any NAT, and with appropriate rules. However that only really helps for traffic volume reduction, the actual security gain from a f/w per se is now minimal, as most attacks are over the top: HTTP/HTTPS, POP/IMAP etc.
> That is why many of us keep repeating that NAT is not a security mechanism.
You can say that in general, network firewalls are not a security mechanism. They are at most a means to prevent brute-force attacks from outside of the network.
to be completely fair with you, everyone misinterprets NAT as a security mechanism, because traditionally it is deployed alongside a stateful firewall.
In reality, of course, the stateful firewall is doing all of the heavy lifting that NAT is getting the credit for. Tailscale does not get rid of the firewall in fact it has a much more comprehensive setup based on proper ACLs.
Though I’m definitely the first to admit that their tooling around ACL’s could be significantly improved
I think they mostly interpret NAT as a security mechanism because that's what it originally was; "NAT" was a species of firewall, alongside "stateful" and "application layer". And NAT obviously does serve a security purpose; just not the inside->out access control function we're talking about here.
> think they mostly interpret NAT as a security mechanism because that's what it originally was; "NAT" was a species of firewall
That’s simply wrong. NAT is, and always has been for the sole purpose of Network Address Translation, I.e. allowing a large IP address space to hide behind a much smaller IP address space (usually a single IP address), for the purpose of mitigating IP address exhaustion.
NATs were meant to be a stop gap solution between IPv4 running out, and the rollout of IPv6. But we all know how that panned out.
The “firewall” like aspects of a NAT are purely incidental. The only reason why a NAT “blocks” unsolicited inbound traffic is because it literally has no idea where to send that traffic, and /dev/null is the only sensible place to direct what’s effectively noise from the NATs perspective.
The fact that NATs shares many of basic building blocks as a very simple stateful firewall is just a consequence of both NATs and firewalls being nothing more than stateful packet routing devices. The same way any standard network switch is (they internally keep a mapping of IP to MAC address of connected devices based of ARP packets, which incidentally blocks certain types of address spoofing, but nobody calls a network switch a firewall).
You're trying to piece this together axiomatically, but you can just read the history of the Cisco PIX firewall to see that the story is not as simple as you want it to be. One of the first and clearly the most popular NAT middlebox products of the 1990s was a firewall, and Cisco made a whole big deal about how powerful NAT was as a security feature.
You’re working backwards here from Cisco’s marketing material. Just because someone in Cisco’s marketing team was smart enough to realise they could market NAT as a security feature, doesn’t mean it was designed to be a firewall.
Apple advertises their iPads as “computer replacements”, that doesn’t mean the iPad was originally designed to be a computer replacement, and it certainly doesn’t make iPads a good computer replacement for many people.
I would also highlight that Cisco PIX had a dedicated firewall layer in addition to its NAT layer, which provided much more capabilities than the NAT layer alone. The fact that these two layers intelligently built on each other is just good implementation engineering, it doesn’t change the fundamental fact that NAT isn’t, and never has been, a proper security tool.
I'm working forwards from my experience at the time as a security engineer working with products that claimed NAT was a security feature, since it allowed for machines to access the Internet without being routable from the Internet for initiated connections, which is why the first commercial PIX product, after Cisco bought Network Translation (which named PIX), was a firewall.
People confuse the fact that NAT is not an especially powerful or coherent security feature with the idea that it isn't a security feature, which leads you to the crazy rhetorical position of having to argue that PIX, the first mainstream NAT product, was not a security product. I have friends who worked on PIX, for many years. I assure you: they were in the Security BU.
I think this position is pretty hopeless, though if you want to drag us around through the network security marketing of the mid-1990s, I'm happy to do so, just for nostalgia's sake. NAT is absolutely a security feature, and was originally deployed as one, in an era where it was still feasible to get routable allocations for individual workstations.
> NAT was a security feature, since it allowed for machines to access the Internet without being routable from the Internet for initiated connections
I'm sure you also know, that any stateful firewall can achieve the same result without having to provide NAT capabilities. Sure Cisco PIX may have been a security appliance, but that doesn't make NAT's a firewall. You don't need Network Address Translation to create a firewall that allows devices to connect to the internet, but makes those machines unrouteable to unsolicited requests. For your claim that NATs are meant to be a firewall, you need to provide an explanation as to why we don't use NATs with IPv6.
Why would increasing the IP address space so that it's once again possible to get routable allocation for indivual workstations, result in people not deploying IPv6 NATs, when apparently they're an important security tool for IPv4, in even in the days when "it was still feasible to get routable allocations for individual workstations"?
Now you're arguing that NAT isn't a good security feature. We agree. There's no reason for us to drill for things to disagree about.
> The same way any standard network switch is (they internally keep a mapping of IP to MAC address of connected devices based of ARP packets, which incidentally blocks certain types of address spoofing, but nobody calls a network switch a firewall).
I thought standard network switches kept a mapping of MAC address to physical network ports, and didn't concern themselves with the IP layer at all (other than things like IGMP/MLD snooping)? Mapping from IP to MAC addresses is a function of hosts/gateways, not switches.
Lots of switches filter out ARP responses that would change the destination of traffic to preexisting clients.
For example: https://www.arubanetworks.com/techdocs/AOS-S/16.10/ASG/YAYB/...
I mean, it really isn’t a security mechanism of any kind. Any security properties at all are completely accidental.
One need only disable stateful firewalling and use that to see how completely dire the situation would be. As all outbound connections open up your host to the internet.
> production corp networks.
Networking has long been the toxic wasteland of security and misconfiguration. Now combine that with newer host-based networking models for containers. The Windows network stack is substantially different now due to that, and more complex. Since Wireguard has been part of Linux, everyone and their brother now has a VPN, somewhere connecting to a VPS. It's probably worse than you think because you don't know what you don't know.
This is to go through NAT, which are devices designed to work around the rarefaction of IPv4 addresses.
Firewalling is a different concept, but since you raise that issue of connectivity wrt. security, I have to say that what makes /me/ sad and anxious is to see how internet security has always been hinging on bloquing paquets based on the destination port.
Doing what's easy rather than what's correct, exemplified and labelled "professional solutions"...
I'm rather more curious as to why you stylized "bloquing paquets"?
They are emphasizing the queuing implementation.
Maybe the OP is French? :)
Haha indeed.
I also frequently stumble upon "connexion" and "trafic", in the same department.
That’s how all voip worked since forever and we also have a bunch of standard and public facing infrastructure to make it easier. All the ice, turn and friends.
It still needs something on the inside to talk to outside first, so the actual firewall should whitelist both outbound and inbound connections.
Than again, if you rely on perimeter, it’s a matter of time when someone figures out what’s your equivalent of high wiz jacket is.
It's no different from traditional VPNs. The tailnet admin has control over the routes that are exposed to clients and ACLs are available to further limit access. It's an overlay network, it doesn't magically give you access to user space on people's laptops.
Given how tailscale works and many of the features (the SSH features especially) it's not terribly hard to imagine a critical flaw or misconfigured setup giving access to userspace
Everything beyond tailscales core VPN features are opt-in. The risk of misconfiguring Tailscale is the same (arguably it’s much smaller) as just misconfiguring SSH on a machine.
At the end of the day, Tailscale works just like any other VPN, from the perspective of the type of data that can traverse between machines connected to the same virtual network. Tailscales use of a P2P wireguard mesh is just an implementation detail, it’s no more or less risky that having every machine connect to a central VPN gateway, and bouncing all their traffic off that. Either way, all the machines get access to a virtual network shared by every other machine, and misconfigured ACLs could result in stuff getting exposed between those machines, which shouldn’t be exposed.
If anything the Tailscale mesh model is much more secure, because it forces every device to use a true zero trust model. Rather than the outdated, “oh it managed to connect to the VPN, it must be safe then” model that traditional VPNs often end up implementing.
I'm not sure how to compare the risk and attack surface of traditional NATs and firewalls vs Tailscale's ACL code, but I'm not sure Tailscale is obviously the riskier choice there. I think more traditional network devices are more familiar and more of a known quantity, but there's a lot of janky, unpatched, legacy network devices out there without any of the security protections of modern operating systems and code.
It's also worth considering that exploitability of ACL code is just one factor in comparing the risk and Tailscale or similar solutions allow security conscious setups that are not possible (or at least much more difficult) otherwise. For example, the NAT and firewall traversal means you don't have to open any ports anywhere to offer a service within your Tailscale network. Done correctly, this means very little attack surface for a bad actor to gain access to that stray VM in the first place. You can also implement fairly complex ACL behavior that's effectively done on each endpoint without having to trust your network infrastructure at all, behavior that stays the same even if your laptop or other devices roam from network to network.
Not to say I believe Tailsclae is bulletproof or anything, but it does offer some interesting tradeoffs and it's not immediately obvious to me the risk is worse than legacy networks (arguably better), and you gain a lot of interesting features and convenience.
And for whatever it's worth, Tailscale is written in a language that makes buffer overflow and memory corruption vulnerabilities extremely unlikely.
You don’t want to be hard on the outside, soft on the inside. Especially because you probably aren’t that hard on the outside!
Defense in depth.
Network security is a myth. NATS, firewalls, ACLs, etc don't keep you safe. Even on your Wifi LAN right now, you aren't safe from local network attacks originating from outside attackers.
Why?
Because hackers can contort themselves into amazing shapes in order to fit through tiny holes in the oddest places. Once they position themselves correctly, and are able to reach the network address and port of a given service, and it has no authentication, it's open season. It may seem difficult, nigh impossible, for a hacker to reach all the way into your WiFi LAN. But there are always twists and turns to take.
From the public internet: tens of thousands of internet routers have publicly known exploits right now, which the router vendors refuse to fix. Just scan the internet for the routers, use your exploit, and you're inside.
From the opposite direction: malware in a website can redirect your browser to the management interface of a router on your local LAN, where it can reconfigure your router. If there is a password but you have logged in from your browser, the active session token lets it right in, and CSRF protection is often disabled or incorrectly set up. And even if it has a password, many such routers have exploits that will work despite a password. Many people also fall for phishing attacks that can drop payloads on your machine directly.
In some cases, the ISP itself has shipped a firmware update to routers that included malware.
All of these things have happened in the past 2 years, to millions of internet users, that we know of. Many large attacks go unnoticed for years. Once the router is compromised, it can be configured to forward ports or enable UPnP, or simply persist malware inside the router itself. The network is wide open and at the attacker's fingertips.
And this is just one class of attack. There are many more that can attack private networks. So there is no place safe from network attacks. Not in a corporate network, not on your local LAN, nowhere. There is no network security. The only network services that can be somewhat trusted are ones which require strong authentication, authorization, and encryption.
A better question is, “why do you think your local network is safe?”.
Have you taken steps to validate the integrity of every single device connected to the network?
If a single device is compromised, how will detect its been compromised?
If a device is compromised, what prevents it from being used to launch an attack on other devices in your network, especially if your security model assumes that all devices on your local network are “safe”?
For a practical example of this happening, in a very impressive manner: https://arstechnica.com/security/2024/11/spies-hack-wi-fi-ne...
For a more boring everyday equivalent, just search around for one of the many botnets that are assembled from compromised SoHo routers, or IoT devices, around the world.
https://arstechnica.com/security/2024/01/chinese-malware-rem...
Assuming a local network is safe and secure is foolish. There’s nothing inherently secure about a local network, the only reason it offers any level of security is due to a local network being many-many orders of magnitude smaller than the entire internet. So the probability of a hostile device (whether intentional installed as hostile, or became hostile after a remote attack) being connected is smaller. But at the end of the day, is security via “being luckier than the next dude”.
As far as I understand, Tailscale won't even let you initiate a connection (or give you WireGuard keys for a node) unless there's an ACL rule that allows it.
Isn’t this essentially what a VPN does? I mean, that’s what TailScale is built on: Wireguard.
That is a whole lot of different levels of exploits that would have to be chained together that you just trivialized there.
How do you suppose they gained access to the kernel and userspace just by having a network connection to the laptop?
By using an unpatched rce in any network exposed code. The whole point of firewall is to prevent bad hackers from the bad internet to exploit your unpatched rces, abuse your default passwords, host based security you shouldn't have had in the first place and access stuff using compromised credentials you didn't revoke or didn't know you should have revoked. Because consistently doing all of that all of the time is hard for creative professionals. It's a chore. It's a tax.
How exactly is Tailscale different to literally any other piece of network capable software in that regard?
NAT traversal requires careful coordination between two devices to create a connection, it’s not like any random device on the internet can perform NAT traversal against a machine just because it’s running Tailscale (not to mention every modern browser has NAT traversal built in for WebRTC connections).
And if the issue doesn’t arise from using NAT traversal, then how does Tailscale expose anything more significant than what a traditional VPN will expose? After all the only difference between a P2P VPN and a traditional VPN, is that a traditional VPN bounces all your traffic off a common server, rather than attempting P2P connections.
I think the point is not that there are necessarily exploits, but by compromising one node in the tailnet they now have the ability to hit code in these locations, or services running on your tun0 interface on your laptop etc.
How is compromising a a single node in a tailnet more dangerous than compromising a single node in a traditional VPN?
Traditional VPN don’t usually put firewalls between machines on the network, because traditionally the whole point of a VPN is to avoid the need for firewalls to provide security between nodes on the virtual network, by assuming that only safe machines can connect to the VPN.
You would typically remove the default any to any ACL rule, and allow the connections that you need. The compromised node normally would not have access to anything interesting. Normally it’s jailed, or would not be able to make outgoing connections.
Am I missing something?
The ACL logic happens in the tailscaled on the destination though doesn't it? So even if you block the access via the ACL the packet has still gone through the network stack and go runtime etc before the traffic is dropped which is a significantly bigger surface than a (traditional) external network firewall.
I see your point. You are talking about a vulnerability.
You are right. Tailscale nodes can send packets processed in any other node, irrespectively of ACLs. Essentially each node gets to “run code” in other nodes, which is normally dropped. I don’t know how deep the Tailscale packets go before being dropped (perhaps the coordination server distributes firewall rules).
But you have to compare with another access method, like, the hub and spoke VPN. The compromised and uncompromised nodes connect to a VPN access server. A compromised node sends packets that are processed in the VPN server, but can also connect to the uncompromised node, meaning, the latter has to process and drop the packets of the former. You have to trust the OS IP stack. To some extent, the same is true if the trusted node VPNs directly to the untrusted node. During an established connection, the networking stack of the trusted node has to block the other side.
Maybe someone familiar with the implementation of ACLs in Tailscale could chime in.
Update: The ACL rules are applied to the incoming packets over tailscale interface. The filtering is then done by tailscaled. The packet has gone past the interface and processes by tailscaled. So an unauthenticated packet indeed travels through the kernel space all the way to the userspace.
How is this any different to any other piece of network capable software that’s listening to a port on your machine?
An external network firewall can only offer protection if you can somehow guarantee that every packet that hits a specific node is first routed via that firewall. Traditionally nobody has setup networks like that, because it requires routing every single packet via a single common bottleneck, causing huge latency and throughput problems.
As for packets going via the network stack, and then the go runtime. Do you honestly believe there’s set of vulnerabilities out there which would allow random external packets to be sent to a random machine, and cause an RCE by virtue of simply being process by the OS kernel, which somehow can only be exploited if you’re running Tailscale? Better still, if such a vulnerability exists, what on earth makes you think your firewall isn’t also vulnerable to same issue, given that pretty much every firewall out there is built on the Linux kernel these days.
> Am I missing something?
Yes
> You would typically remove the default any to any ACL rule
This part doesn’t happen.
Defaults are rarely changed.
Security in depth.
I wish there was a tailscale-like equivalent without connectivity encryption, for devices which encrypt at the application layer (like almost the entire internet does). We don't always need the lower layers to be encrypted, this is especially computationally expensive for low power devices (think IoT stuff running a tailscale like tunnel).
GRE tunnels exist and I actually use them extensively, but UDP hole punching is not handled so hub-and-spoke architecture is needed for them, no peer to peer meshes with GRE (ip fou).
Are there equivalent libraries out there which do UDP hole punching and unencrypted GRE tunnels following an encrypted handshake to confirm identity?
Yes, the established standard here is known collectively as Interactive Connectivity Establishment (ICE) [1] which WebRTC relies on -- there are a few good libraries out there that implement it and/or various elements of it [2] [3].
libp2p [4] may be what you're after if you want something geared more towards general purpose connectivity.
[1] https://datatracker.ietf.org/doc/html/rfc8445
[2] https://github.com/pion/webrtc
[3] https://github.com/algesten/str0m
[4] https://libp2p.io
Thank you for the resources! I will study them.
FWIW, libp2p also enforces transport encryption, quote:
> Encryption is an important part of communicating on the libp2p network. Every connection must be encrypted to help ensure security for everyone. As such, Connection Encryption (Crypto) is a required component of libp2p.
Turn, stun, ice is what does hole punching for voip, so you can reuse libraries from voip for that
You could try to bring back Teredo.
It's not UDP but I do TCP hole punching here: https://github.com/robertsdotpm/p2pd and every other major method of NAT traversal.
It's written in Python. Though its not based on using the default interface like most networking code. I wanted the possibility to be able to run services across whatever interfaces you like. Allowing for much more diverse and useful things to be built. Its mostly based on standard library modules. I hate C extension crap as it always breaks packages cross-platform.
[dead]
> So, to traverse these multiple stateful firewalls, we need to share some information to get underway: the peers have to know in advance the ip:port their counterpart is using. > [...] To move beyond that, we built a coordination server to keep the ip:port information synchronized
This is where I wish SIP lived up to its name (Session Initiation Protocol, i.e. any session, such as a VPN one...) and wasn't such a complicated mess making it not worth the hassle. I mean it was made to be the communication side-channel used for establishing p2p rtp streams.
Yeah, sip is doing so many things that its scary to load all them in your head at the same time.
Its like http, but its also statefull, bidirectional, federated and works over udp too.
Just looking at the amount of stuff (tls over udp included) baresip implements to barely sip. And it isnt even bloated, the stuff has to be there.
> Its like http
and for the same reason: both were initialy designed to be simple...
(2020)
Previous discussion:
Thanks. Links aren't clickable. Maybe these will be:
https://news.ycombinator.com/item?id=30707711
https://news.ycombinator.com/item?id=24241105
Thanks! Macroexpanded:
How NAT traversal works (2020) - https://news.ycombinator.com/item?id=36969018 - Aug 2023 (106 comments)
How NAT traversal works (2020) - https://news.ycombinator.com/item?id=30707711 - March 2022 (37 comments)
How NAT Traversal Works - https://news.ycombinator.com/item?id=24241105 - Aug 2020 (28 comments)
(p.s. your links weren't clickable because lines that are indented with 2 or more spaces get formatted as code - see https://news.ycombinator.com/formatdoc)
This is the article I sent people to for NAT traversal
This may be the only way we ever have to build p2p apps. IPv6 doesn't have enough steam since NAT and SNI routing solve most problems for most people.
And ISPs are very much not incentivized for that to change.
Interesting blast from the past. We built an oblivious p2p mesh network that did this in 2010. Back then, nobody cared about security as much as we thought they should. Since then, nobody still cares about security as much as they should. Devices have increased and their value has increased, and still, they are quite insecure. Truly secure endpoints with hardware root-of-trust and secure chains of trust for authn/authz and minimal temporary privileges is still hard, and network perimeter security theater is still ongoing in home networks, corp networks and even large production datacenter networks. Only reason we don't find these to be the primary root-cause for security breaches is because more easier attack chains are still easily available!
The fact that this emerged instead of IPv6 is a true testament to the power of "good enough hackery"
Really clear and clean exposition on what can be a hairy and badly-discussed subject, thanks for posting!
Such a great explanation. Wish I would have had something like this back in my gamedev days.
It really shows how much we need ipv6 gua addresses everywhere
IMO: this is arguably one of the most detailed articles on NAT traversal on the entire Internet. But it is missing information on delta behaviors (its not that complex -- just that some NATs have observable patterns in how they choose to assign successive external ports. The most common one is simply preserving the source port. But there can also be others, e.g. an increment of the former mapping.)
It's a very good theoretical article. I wonder to what extent a software engineer could use this though. Because although it does describe many things I'm not sure there's enough detail to write algorithms for it. Like, could an engineer wrote an algorithm to test for different types of NATs on the basis of this article? Could they adapt their own hole punching code? I've personally read papers where simple tables were more useful than entire articles like this (as extensive as it is.) Maybe still a good starting point though.
Also, the last section in the article is extremely relevant. It has the potential to bypass symmetric NATs which are used in the mobile system. The latest research on NAT traversal uses similar techniques and claims near 100% success rates.
Another year, another repost of an article about NAT traversal, another couple replies about how this is insecure followed by people explaining that NAT is not a security feature.
It's like the Aragon broken toe of networking.
It will be year of P2P networking for realsies this time tho!
P2P is all over the place. It’s just largely invisible, part of the internals of games and VoIP and video chat apps among many other things.