[4504] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Linux TCP Changes for protection against the SYN attack

daemon@ATHENA.MIT.EDU (Alan Cox)
Sat Sep 21 23:12:03 1996

Date: 	Sat, 21 Sep 96 23:59 BST
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
To: linux-net@vger.rutgers.edu

[Bcc the network development list and a few relevant people]

Various people have asked about this so here is a sort of status report:

I've tried several experimental ideas put forward on the end2end and
some other lists, and a couple of my own. As I write this my machine is
sustaining a continuous 64Kbit/second of spoof SYN frames and web access
is slightly impaired (a few 3 second pauses) but not bad. Sustaining the
equivalent of a modem based attack the machine is not noticably harmed
at all.

There are however some side effects. To get this level of protection
at 200 spoofed frames/second needs typically a backlog of up to 512 sockets.
That means in real terms a server thats capable of surviving this kind
of attack is probably going to be using up to 150Kbytes/service protected 
of extra non swappable memory. It also means modifying inetd and sendmail
to request large queues. 

In practical terms I think turning this on, patching the binaries needed
and adding 4Mb of RAM to the machine will give a box that has the same
performance under attack as without protection before.

A big thanks goes to Vern Schryver who implemented this algorithm on the
SGI machines and posted it to end2end. It seems to work rather well.

Oh the other gotcha - when you apply the patch (if you do ;)) you will need
to rebuild your modules (and yes if you have the infamous binary only AFS
I've probably broken it again...)

Also included is a fix to the problem of talking to netblazers and other
KA9Q stacks running with syndata enabled. 

Can people whom can do a bit of testing to give these patches a good hammering,
and folks being attacked may as well try them too... If they seem OK I'll
submit them on for the next 2.0.x kernel proper.

Alan Cox
Linux Networking Project

--patch--

diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/Documentation/Configure.help linux/Documentation/Configure.help
--- linux.vanilla/Documentation/Configure.help	Sat Sep 21 23:23:38 1996
+++ linux/Documentation/Configure.help	Sat Sep 21 16:36:44 1996
@@ -940,6 +940,16 @@
   to your network or a transparent proxy. Never ever say Y to this for
   a normal router or host.
 
+IP: TCP syn bomb filter
+CONFIG_IP_TCPSF
+  This option enables filters designed to protect your machine from attack
+  by programs such as the one published in 2600 magazine which tie your 
+  machine up answering fake network requests. For it to be effective you
+  should make sure your daemons are using fairly long listen queues (I
+  would suggest 256 or 512). That has memory impacts. A machine set up in
+  this configuration and under heavy use (or attack) is likely to need 1-4Mb
+  more memory depending on the number of network services.
+
 IP: aliasing support
 CONFIG_IP_ALIAS
   Sometimes it is useful to give several addresses to a single network
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/include/linux/if_arp.h linux/include/linux/if_arp.h
--- linux.vanilla/include/linux/if_arp.h	Sat Sep 21 23:20:55 1996
+++ linux/include/linux/if_arp.h	Sat Sep 21 18:31:49 1996
@@ -53,6 +53,8 @@
 #define ARPHRD_LOOPBACK	772		/* Loopback device		*/
 #define ARPHRD_LOCALTLK 773		/* Localtalk device		*/
 
+#define ARPHRD_SCSI	777
+
 /* ARP protocol opcodes. */
 #define	ARPOP_REQUEST	1		/* ARP request			*/
 #define	ARPOP_REPLY	2		/* ARP reply			*/
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/include/linux/socket.h linux/include/linux/socket.h
--- linux.vanilla/include/linux/socket.h	Sat Sep 21 23:21:28 1996
+++ linux/include/linux/socket.h	Sat Sep 21 16:30:20 1996
@@ -79,7 +79,7 @@
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
-#define SOMAXCONN	128
+#define SOMAXCONN	512
 
 /* Flags we can use with send/ and recv. */
 #define MSG_OOB		1
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/include/net/sock.h linux/include/net/sock.h
--- linux.vanilla/include/net/sock.h	Sat Sep 21 23:20:29 1996
+++ linux/include/net/sock.h	Sat Sep 21 21:38:48 1996
@@ -167,6 +167,7 @@
 	unsigned short		rcv_ack_cnt;		/* count of same ack */
 	__u32			window_seq;
 	__u32			fin_seq;
+	__u32			syn_seq;
 	__u32			urg_seq;
 	__u32			urg_data;
 	int			users;			/* user count */
@@ -245,10 +246,10 @@
 						   cause failure but are the cause
 						   of a persistent failure not just
 						   'timed out' */
+	unsigned short		ack_backlog;
+	unsigned short		max_ack_backlog;
 	unsigned char		protocol;
 	volatile unsigned char	state;
-	unsigned char		ack_backlog;
-	unsigned char		max_ack_backlog;
 	unsigned char		priority;
 	unsigned char		debug;
 	unsigned short		rcvbuf;
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/include/net/tcp.h linux/include/net/tcp.h
--- linux.vanilla/include/net/tcp.h	Sat Sep 21 23:23:21 1996
+++ linux/include/net/tcp.h	Sat Sep 21 21:39:02 1996
@@ -145,6 +145,7 @@
 extern void tcp_do_retransmit(struct sock *, int);
 extern void tcp_send_check(struct tcphdr *th, unsigned long saddr, 
 		unsigned long daddr, int len, struct sk_buff *skb);
+extern void tcp_close(struct sock *sk, unsigned long timeout);
 
 /* tcp_output.c */
 
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/net/ipv4/Config.in linux/net/ipv4/Config.in
--- linux.vanilla/net/ipv4/Config.in	Sat Sep 21 23:21:50 1996
+++ linux/net/ipv4/Config.in	Sat Sep 21 16:31:49 1996
@@ -27,6 +27,7 @@
       fi		
   fi
 fi
+bool 'IP: TCP syn bomb filter' CONFIG_IP_TCPSF
 if [ "$CONFIG_NET_ALIAS" = "y" ]; then
 	tristate 'IP: aliasing support' CONFIG_IP_ALIAS
 fi
@@ -36,7 +37,7 @@
   fi
 fi
 comment '(it is safe to leave these untouched)'
-bool 'IP: PC/TCP compatibility mode' CONFIG_INET_PCTCP
+#bool 'IP: PC/TCP compatibility mode' CONFIG_INET_PCTCP
 tristate 'IP: Reverse ARP' CONFIG_INET_RARP
 bool 'IP: Disable Path MTU Discovery (normally enabled)' CONFIG_NO_PATH_MTU_DISCOVERY
 #bool 'IP: Disable NAGLE algorithm (normally enabled)' CONFIG_TCP_NAGLE_OFF
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/net/ipv4/af_inet.c linux/net/ipv4/af_inet.c
--- linux.vanilla/net/ipv4/af_inet.c	Sat Sep 21 23:22:15 1996
+++ linux/net/ipv4/af_inet.c	Sat Sep 21 22:03:28 1996
@@ -537,7 +537,7 @@
 	 * note that the backlog is "unsigned char", so truncate it
 	 * somewhere. We might as well truncate it to what everybody
 	 * else does..
-	 * Now truncate to 128 not 5. 
+	 * Now truncate to 512 not 128
 	 */
 	if ((unsigned) backlog == 0)	/* BSDism */
 		backlog = 1;
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/net/ipv4/tcp.c linux/net/ipv4/tcp.c
--- linux.vanilla/net/ipv4/tcp.c	Sat Sep 21 23:23:27 1996
+++ linux/net/ipv4/tcp.c	Sat Sep 21 22:08:22 1996
@@ -438,7 +438,7 @@
 unsigned long seq_offset;
 struct tcp_mib	tcp_statistics;
 
-static void tcp_close(struct sock *sk, unsigned long timeout);
+extern void tcp_close(struct sock *sk, unsigned long timeout);
 
 /*
  *	Find someone to 'accept'. Must be called with
@@ -1734,7 +1734,7 @@
 }
 
 
-static void tcp_close(struct sock *sk, unsigned long timeout)
+void tcp_close(struct sock *sk, unsigned long timeout)
 {
 	struct sk_buff *skb;
 
@@ -1837,7 +1837,14 @@
 {
 	struct wait_queue wait = { current, NULL };
 	struct sk_buff * skb = NULL;
-
+#if 0	
+	/*
+	 *	If we want to do some accepts and the queue is full
+	 *	we do a random drop
+	 */
+	if(sk->max_ack_backlog == sk->ack_backlog)
+		tcp_random_drop(sk);
+#endif
 	add_wait_queue(sk->sleep, &wait);
 	for (;;) {
 		current->state = TASK_INTERRUPTIBLE;
diff --unified --recursive --new-file --exclude-from exclude linux.vanilla/net/ipv4/tcp_input.c linux/net/ipv4/tcp_input.c
--- linux.vanilla/net/ipv4/tcp_input.c	Sat Sep 21 23:23:21 1996
+++ linux/net/ipv4/tcp_input.c	Sat Sep 21 22:52:25 1996
@@ -342,6 +342,158 @@
 }
 
 
+#ifdef CONFIG_IP_TCPSF
+
+/*
+ *	Simple randomish number generator. We perturb it by time,
+ *	and also to make it hard to create an attack that exploits
+ *	knowing this algorithm, by kernel pointers.
+ */
+ 
+extern inline u16 random16(struct sock *sk)
+{
+	static s32 seed = 152;
+	seed = seed * 69069L +1;
+	return (u16)(seed^(u32)sk^jiffies);
+}
+
+static void tcp_random_drop(struct sock *sk)
+{
+	int q=skb_queue_len(&sk->receive_queue);
+	struct sk_buff *skb;
+	static int c=0;
+	unsigned long flags;
+	
+	
+	if(!q)
+		return;
+
+	save_flags(flags);
+	cli();		
+	
+	q=random16(sk)%q;	/* 0 to q-1 */
+	
+	skb=skb_peek(&sk->receive_queue);
+	while(q)
+	{
+		skb=skb->next;
+		q--;
+	}
+	
+	/*
+	 *	Not unconnected, just ready for accept. We don't drop
+	 *	these. Note by not dropping these we distort the random
+	 *	drop so that the more real complete sockets queued the
+	 *	less we do random drop. Its a sort of accidental flow
+	 *	control. I don't think its necessary to fix this effect.
+	 */
+	 
+	if(skb->sk->state!=TCP_SYN_RECV)
+	{
+		restore_flags(flags);
+		return;
+	}
+		
+	/*
+	 *	Close the socket. We can drop directly to
+	 *	closed as the client side if real will
+	 *	retransmit or eventually time out. In those
+	 *	cases the remote end will take the 2 minute
+	 *	timeout and protect the sequence space.
+	 *
+	 *	We MUST never send a reset. If we do that
+	 *	we may assassinate a real TIME_WAIT by the 
+	 *	client or a one way connection (see 
+	 *	draft-heavens).
+	 */
+
+	tcp_set_state(skb->sk, TCP_CLOSE);
+	skb->sk->state_change(skb->sk);
+	sk->write_space(sk);
+	tcp_close(skb->sk,0);
+}
+
+/*
+ *	TCP filter for SYN frames. 
+ *
+ *	[Disabled scheme is]  We disallow more than 30% queue
+ *	occupancy by the same class C network range. This is designed
+ *	to stop runaway hosts and attacks being made via a competent
+ *	provider who has proper address filter rules. It also protects
+ *	against the demon9 and zakath SYN attack programs which use a
+ *	constant bogus source address.
+ *
+ *	[Enabled scheme is]
+ *
+ *	If we are faced with a continual sweep of bogus addresses (eg the
+ *	2600 attack program) then we start using random drop. Vern
+ *	Schryver postulates that:
+ *
+ *	"As long as the length of the queue is longer than RTT of the real
+ *	 clients times the rate of bogus SYNs/sec, the real clients have an
+ *	 excellent probability of getting through on their first attempt"
+ *
+ *	This random drop will not solve a heavy enough attack, but with
+ *	clients using a 500 frame queue it should be needing of the order
+ *	of 2000 packets/second to annoy and 3/4000 to cripple. At that 
+ *	point the attacker has to generate around 80,000-160,000bytes/second
+ *	for each port it kills. When we hit that rate a backbone trace
+ *	by the big ISP's becomes much much easier. Its also over the important
+ *	4000bytes/second threshold of the average modem luser.
+ *		
+ */
+ 
+static int tcp_syn_filter(struct sock *sk, struct sk_buff *skb, __u32 saddr)
+{
+	extern void tcp_close(struct sock *sk, unsigned long timeout);
+#if 0
+	int ct=0;
+	struct sk_buff *tmp;
+	unsigned long flags;
+#endif	
+	/* If we have < 33% queue occupancy we don't care */
+	if(3*sk->ack_backlog<=sk->max_ack_backlog)
+		return 0;
+#if 0		
+	/*
+	 *	Count across the subnet.
+	 */
+	 	
+	saddr&=htonl(0xFFFFFF00);
+	
+	save_flags(flags);
+	cli();
+	
+	tmp=skb_peek(&sk->receive_queue);
+	while(tmp && tmp!=(struct sk_buff *)&sk->receive_queue)
+	{
+		/*
+		 *	Fits mask ?
+		 */
+		 
+		if((tmp->sk->saddr&htonl(0xFFFFFF00))==saddr)
+			ct++;
+			
+		tmp=tmp->next;
+		
+	}
+	
+	restore_flags(flags);
+	
+	/*
+	 *	Report whether to accept
+	 */
+	 
+	if(3*ct>=sk->max_ack_backlog)
+	{
+		return 1;
+	}
+#endif	
+	return 0;
+}
+
+#endif
+
 /*
  *	This routine handles a connection request.
  *	It should make sure we haven't already responded.
@@ -382,13 +534,21 @@
 	 *	set backlog as a fudge factor. That's just too gross.
 	 */
 
-	if (sk->ack_backlog >= sk->max_ack_backlog) 
-	{
+
+	if (sk->ack_backlog >= sk->max_ack_backlog 
+#ifdef CONFIG_IP_TCPSF	
+		|| tcp_syn_filter(sk,skb,saddr)
+#endif		
+	)
+	{
+#ifdef CONFIG_IP_TCPSF
+		tcp_random_drop(sk);
+#endif			
 		tcp_statistics.TcpAttemptFails++;
 		kfree_skb(skb, FREE_READ);
 		return;
 	}
-
+	
 	/*
 	 * We need to build a new sock struct.
 	 * It is sort of bad to have a socket without an inode attached
@@ -428,6 +588,8 @@
 			return;
 		}
 	}
+	
+	skb->when = jiffies;	/* For timeout */
 	skb_queue_head_init(&newsk->write_queue);
 	skb_queue_head_init(&newsk->receive_queue);
 	newsk->send_head = NULL;
@@ -438,6 +600,7 @@
 	newsk->rto = TCP_TIMEOUT_INIT;
 	newsk->mdev = TCP_TIMEOUT_INIT;
 	newsk->max_window = 0;
+	newsk->sleep = sk->sleep;	/* Wake our parent for now */
 	/*
 	 * See draft-stevens-tcpca-spec-01 for discussion of the
 	 * initialization of these values.
@@ -470,6 +633,7 @@
 	newsk->delay_acks = 1;
 	newsk->copied_seq = skb->seq+1;
 	newsk->fin_seq = skb->seq;
+	newsk->syn_seq = skb->seq;
 	newsk->state = TCP_SYN_RECV;
 	newsk->timeout = 0;
 	newsk->ip_xmit_timeout = 0;
@@ -2074,10 +2238,21 @@
 		return tcp_reset(sk,skb);
 	
 	/*
-	 *	!syn_ok is effectively the state test in RFC793.
+	 *	Check for a SYN, and ensure it matches the SYN we were
+	 *	first sent. We have to handle the rather unusual (but valid)
+	 *	sequence that KA9Q derived products may generate of
+	 *
+	 *	SYN
+	 *				SYN|ACK Data
+	 *	ACK	(lost)
+	 *				SYN|ACK Data + More Data
+	 *	.. we must ACK not RST...
+	 *
+	 *	We keep syn_seq as the sequence space occupied by the 
+	 *	original syn. 
 	 */
 	 
-	if(th->syn && !syn_ok)
+	if(th->syn && skb->seq!=sk->syn_seq)
 	{
 		tcp_send_reset(daddr,saddr,th, &tcp_prot, opt, dev, skb->ip_hdr->tos, 255);
 		return tcp_reset(sk,skb);	

home help back first fref pref prev next nref lref last post