One Million Robbery Queries

Today I had the very unpleasant surprise to find out that over the last month we had nearly one million requests from ChatGPT scraping bots. That was especially the case on our photos gallery website where they made a request every single second to check if there were any new pictures to steal so that our cats and dogs could be featured in an AI generated image.

First step was to ban them, but that might not be sufficient as this is just a random IP block within a Azure DC. Ironically, I asked ChatGPT to generate me a robots.txt file that bans the ChatGPT scraping bots. Here it is:

User-agent: ChatGPT-User
Disallow: /
User-agent: ChatGPT
Disallow: /
User-agent: GPTBot
Disallow: /

IPs ban on Linux

Ban Hammer

Who needs a quick ban?

Today we had a bruteforce attack on our nginx server. Well cannot say he was anywhere near successful though, the guy did POST /wp-login.php several times per second and all he got as an answer was 404. Fat chance…

But still, he had our access logs growing far larger than they usually do. So I tried to ban him. Unfortunately nginx does not use TCP wrappers by default (you can use ngx_tcpwrappers although it will have to be rebuilt from source).

So I made a little script, called ban-hammer to temporarily ban IPs using IPTables. There is also a cron.daily script to unban IPs each day. The script requires rpnc, but it is easy to adapt without it.

These scripts add and remove the IPs into a special IPT chain (which you can configure in the script). So you also have to configure your firewall to jump to the two chains and load banned IPs on boot:

echo "Bans"

load_bans() {
  ban_table=$1
  ban_chain=$2
  iptables=$3

  $iptables -N $ban_chain

  while read ban
  do
    ip=$(echo "$ban" | cut -d'=' -f 1)
    $iptables -A $ban_chain -s "$ip" -j DROP
  done < "$ban_table"

  $iptables -A INPUT -j $ban_chain
}

load_bans /etc/firewall/ip4.ban IP4BAN iptables
load_bans /etc/firewall/ip6.ban IP6BAN ip6tables