Over the Hedge

The dark web 101: what it is, how it works, and why people use it

Published | Last Updated

Interesting fun fact, the internet only connects about 5.5 billion people as of 2024. Not surprisingly, most of these people live in developed countries as you can see in the animation below from the United Nations.

Percentage of Individuals Using the Internet

There are infinite ways to classify the internet, but I think that the most mysterious one is the breakdown of clearnet, deepweb, and darkweb. Most people that are aware of the distinction between those three terms don’t get too hot and bothered about the darkweb or the deepweb, but to the layman, the darkweb sounds like a scary place where you can only find spooky and unsavory things. I’ve always had a fascination with tech so I thought I’d do some research myself to understand things a little better.

Unfortunately, to truly understand the dark web you also kind of need to understand the basics of what makes the regular old internet tick. If you’re already familiar with the basic then you can skip ahead to Some Definitions

The Modern Internet

A Flash of Light as the Birth of the Web

Before the mushroom clouds in the Oppenheimer movie were a twinkle in Christopher Nolan’s eye, there was the real Oppenheimer. He’s famous for his quotes, one of my favorites being: “We knew the world would not be the same. A few people laughed, a few people cried.” No one at the time could have imagined that the nuclear bomb would spark a mad race to prevent further destruction which resulted in the internet.

In the 1960s, the United States Air Force was trying to improve its ability to ensure mutually assured destruction (MAD). My seventh-grade history teacher loved that term for all the right reasons. MAD was the big idea to prevent future nuclear conflict. It all boiled down to the most basic of human fears: if you nuke us, we nuke you. Simple game theory says that the stable equilibrium is no nuclear launches as long as both sides can still communicate and coordinate a retaliatory strike.

To guarantee that communication, the military needed systems that could survive jamming, attacks, and partial blackouts. These systems needed to send messages over long distances without relying on a direct connection.

Enter packet switching: the foundation of the modern internet.

Packet Switching, ARPANET, and TCP/IP

The core issue with point-to-point communication systems is that we want to send arbitrarily long messages over an unreliable messaging medium (all of them are, to some extent). For the sake of argument, let’s say that you’re trying to send a message cross country. It’s critically important that the message arrive and you don’t want to take any chances that your recipient misses even a piece.

You could send messages over radio in their entirety or encoded/encrypted messages over morse code or telegram, but intermittent connection mean data loss.

Packet switching is the idea that you can chunk messages into fixed packets of information that are transmitted over an arbitrary network medium. The packet need only contain enough information to identify the start of the message, the address, sender, and an id for ordering parts of, the piece of the message, and lastly a little piece to identify the end of the message. This way, instead of needing to be able to send the whole message from point A to point B, you can reliably transport most of it in pieces and just have the recipient re-request for any missing pieces. You can see a sample of a data packet here:

Sample Packet

If you break a message into 100 packets and your network drops 10%, you just resend the missing packets until the recipient has all of them. All you need is a network of computers willing to relay packets. The first such network was ARPANET, created by DARPA. ARPANET didn’t initially support lost packet recovery - if the network failed to deliver, the sender and receiver simply got stuck waiting.

To fix this, early developers created a protocol that sits on top of the network called Transmission Control Protocol/Internet Protocol (TCP/IP). This protocol had roughly 4 ground rules:

  1. Each distinct sub-network in the broader network could/should stand on its own. No overarching network could impose requirements for other networks to connect.
  2. Communications (i.e. packets) would be forwarded on a best efforts basis. If a packet didn’t make it to the final destination, it would be up to the packet source to retransmit.
  3. Intermediate connection points would store no information about the individual flows of packets to keep them simpler and low latency
  4. No global authority would/could own the network

They also needed to handle the following issues:

HTTP/HTTPS

There’s a fundamental issue with sending TCP/IP packets: packet contents are cleartext. Anyone along the route can read them, like passing a note to your crush through a classroom and every kid opens it on the way.

What we need is some way for the messages to be hidden from prying eyes but for the correct recipient to be able to read the message without anybody else reading it. Enter encryption. Cryptography is a field that is far too deep to even start with a 101 in this post, but maybe I’ll get into it another time. For now, it suffices to say that there is a protocol called Hypertext Transfer Protocol (HTTP) which sends data between two communicating services, which can be encrypted using Transport Layer Security (TLS). This encrypted variant of HTTP is called HTTPS (HTTP Secured).

Summary

The modern internet is a stack of different protocols:

  1. Physical - the actual wires connecting computers
  2. Data link - allows sending “frames” of data
  3. Network - layerleyer that coordinates sending packets across a network, including addressing, routing, and traffic control
  4. Transport (TCP, UDP)
  5. Session - managing continuous streams of data between two clients
  6. Presentation - encoding, compression, encryption/decryption
  7. Application (HTTP, HTTPS, FTP, FTPS, SFTP, SMTP) - high level protocols that coordinate how to actually exchange data

These layers enable encrypted communicatino over unreliable networks. Fleshing out the summary here into more details would be a literal textbook, so I won’t go into more details ARPANET, TCP/IP, HTTP, SSL, or TLS. If you’re interested you can follow A Brief History of the Internet and all the online resources like the OSI Model overview on Wikipedia that describe the modern web stack. Those articles have much better technical writers than I and have included more detail than I’ll ever be able to address.

With the basics covered, we can define the terms from the introduction.

Some Definitions

Sample Packet

Who Needs the Darkweb?

Even though the modern internet stack using HTTPS hides message contents, it does not hide WHO is communicating. To normal people, this is no big deal. This could literally be life or death for:

  1. Political dissidents: try being a Russian critic of Putin or a Chinese critic of the CCP. These folks could use a way to communicate and share information without anybody knowing that they’re communicating
  2. Journalists: need to be able to communicate with sources securely (provided by HTTPS) AND ANONYMOUSLY
  3. Whistleblowers: they’ll need to communicate anonymously with tip off points
  4. Criminals: probably obvious why they would want to communicate anonymously
  5. People looking to prevent digital advertisers from tracking them
  6. People that want to support better privacy for all internet users

The dark web solves this problem by introducing a statistically anonymous communication protocol.

The Onion Router

If you’re any of the above cateogries, you might want to use onion routing.

The protocol is pretty simple. Instead of sending messages directly from sender to recipient, packets are routed in a secure layer from sender to intermediary 1, then intermediary 2, then to intermediary 3 and finally to the receipient.

To guarantee anonymity of the sender and recipient, the original message is wrapped in encrypted envelops in multiple layers. If intermediaries pass enough messages, there is a statistically low probability that somebody watching messages passed through the network could back out the original sender and intended recipient. That snoop would need to be watching a large portion of the network simultaneously.

Palo Alto Networks has a pretty good overview of TOR. The procedure is roughly as follows:

  1. A user builds a TOR circuit by selecting three nodes to relay messages and obtains a shared public encryption key from each of them. Let’s call them N1, N2, and N3
  2. The user encrypts the private message that she wants to send in three layers. Let’s say encryption of a message M with public key i is denoted by fi(M). The user sends a message M1 that looks like f1(f2(f3(M))) to relay node N1.
  3. node 1 decrypts the wrapped message into the internal message. it reads that the next node in sequence is N2. node 1 forwards the contents of the message M1 to N2.
  4. node 2 decrypts the wrapped message into the internal message. it reads that the next node in sequence is N3. node 2 forwards the contents of the message M2 to N3.
  5. node 3 decrypts the wrapped message into the internal message. it reads that the next node in sequence is the intended recipient (R). node 3 forwards the contents of M2 to recipient R.
  6. the recipient receives the message and decrypts it. Recipient now has the option to reply and each relay node can rewrap messages at each step of the way to send messages back

What’s great about this is that at each step, the sender only knows the alias for the service, and each relay only knows about their immediate neighbors in the message. Even better, the receiver knows nothing about the sender except the contents of the message, and the sender knows nothing about the receiver except the alias of the service and the responses that they send.

who knows what in a TOR relay

Let’s assume now that there is some globally omniscient observer that can view every connection and every message in and out of every network node.

If the communicators use TOR, then it becomes impossible for said observer to correlate the first wrapped message with the final message sent to the receiver unless the observer can control all of the relay nodes.

global observers

Operational Security (OpSec) Considerations

Before we start, you might ask: why do I need this? Presumably you want to avoid letting bad actors (read: Facebook? OpenAI? Hostile governments?) surveil you. You also want to avoid exposing yourself to attacks and theft.

There are people that write multiple textbooks on the topic of opsec. I can’t possibly cover enough to save you from yourself, and frankly I bet I don’t know enough to protect myself 100% (does anybody really?). With that caveat in mind, read the below posts to maintain your safety when browing the dark web. As it turns out, these tips also are applicable to the regular web.

You should start with reading the basics on this GitHub gist and this Reddit post. Obviously, there are levels to this, and the only people that truly need to go to the furthest extremes are probably sending messages via carrier pigeon.

Common mistakes that put you at risk

Frequently Asked Questions

I told my friends that I’m writing something about this and heard a suite of questions… some better than others.

1. Is the dark web illegal?

I cannot emphasize this enough: this is not legal advice and is merely my personal, non-professional opinion. That said, the legality of the dark web depends on jurisdiction. If you live in the United States and are not otherwise committing a crime, the law as of the time of this posting suggests that you are not committing any crimes.

In other countries, this is not necessarily the case. It’s possible that using TOR services can put you at significant personal risk. For example, I can imagine certain east Asian countries might not be too thrilled.

2. Can the police track TOR usage?

I cannot emphasize this enough: this is not legal advice and is merely my personal, non-professional opinion. In theory, it is absolutely possible for police to track TOR usage. In practice, this should be exceedingly difficult because anonymous and uncompromised relay hosts probably aren’t actively sharing your messages with law enforcement. If you’re worried that the police are using their limited resources to track your TOR usage, it’s probably more likely that they’re using higher quality and lower tech exploits (every heard of wire-tapping warrants?).

3. What’s the difference between Tor and VPNs?

A Virtual Private Network (VPN) acts as an overlay to pass messages between participants. The VPN generally has a known set of relays or at least a set of relays that are all controlled by the same provider. As such, compromising the intermediary means compromising your privacy.

TOR is a protocol that passes messages between relay circuit members in a big game of telephone. The relay strategy allows roughly anonymous communication between parties without requireing a dedicated and trusted third-party to act as the relay.

Conclusion

One of my favorite authors, Douglas Adams, says Space is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space.”

Even if I’m not as entertaining as the Hitchiker’s Guide to the Galaxy, I can at least say that the internet is pretty big too. Most of us are exposed to the internet via public services, but that’s just the tip of the iceberg. There is just so much data hidden below the surface, just out of reach, and hidden deep so nobody can find it without know it’s there.

It might seem dark and mysterious, but at the end of the day these deep and hidden services are just another evolution of the data privacy craze that started millennia ago with basic cyphers and has extended into the data age. I’m 100% confident that encryption will continue to evolve in ways that we could never have imagined.

In the meantime, let me know if there’s any topics around the dark web that you’d like me to dig into in further detail.

#Onion #Tor #Opsec #Privacy #Dark Web #Internet #Tls #Arpanet #Clearnet

comments powered by Disqus