Around 2004, Erowid.org hit the human-robot inflection point: the number of robot / bot / script “hits” to our websites exceeded the number of human visitor hits. By 2014, the numbers were more like 10x robot to human. In 2025, it’s 1000x robots to humans.
There are a lot of complications in describing this. Many of the “good actors” are search engines. Google scrapes our entire site every day, as does Bing. Yahoo bot every couple days. They are not actually good robot actors anymore, they steal content from sites they scrape, such as erowid, then publish it on their own websites without giving us credit or a link back. But those are the “good” robots. They are horrible mega corp thieves, but at least they don’t disable our servers.
There is also the ridiculous joke that the current LLMs can be trusted to serve up medical information to the public that they’ve “synthesized” from the web. So even the “best” LLM robot scrapers are doing the public a disservice, damaging our services, and making original-content sites like erowid (and many others) less relevant.
There are other good bot spiders who limit their rates and don’t create insane parameter abuse. Parameters are things added to a URL after a question mark or a slash that can send additional data to the server or request extra actions or responses. On erowid.org, our Experience Vaults are mostly parameter driven. The difference between https://erowid.org/experience/exp.cgi and https://erowid.org/experience/exp.cgi?Y1=2025 is that the the second url adds a name-value pair called a parameter.
Many web servers, such as our Apache (praise Holey Apache) also accept parameters through slashes. Erowid handles slash params in some of our software, but not most. It gets pretty technical and I’m not trying to bore those who don’t write server code or those who do.
For instance, humans and googlebot hit donations.php. Bad bots try donations.php with ten thousand different question mark parameters to try to see if the page is a WordPress entry or some other known piece of software that might have a published bug. Or insert “XSS” (cross site scripting) to see if they can create a URL they could spam out where it would appear to be a valid erowid URL, but would do something malicious. They are trying to find security holes to exploit to break into our site. Or they are just trying to overwhelm our server with useless hits. We have spent decades hardening erowid against such attacks, but Real Security Is Real Hard. There is no end to fighting against attack robots.
In the era of “cloud computing”, where many people can rent virtual server space that goes almost unmonitored by those who rent it out, people who used to badly run archiving spiders can now inflict an army of bad spiders out at the same time.
I spend hours every day on our “shields”, but the attacks are absurd.
Good robot-spiders behave nicely. They honor robots.txt exclusion files and they report who they are and what they’re doing accurately. Bad robots (now 99.9% of robots) lie about their identity and their purpose. And Amazon and Google and other virtual-server hosts now allow evil robots to run on their platforms that lie that they are amazonbot, googlebot, bingbot, etc etc etc.
Another common type are home human users who want to make a copy of Erowid.org but don’t understand how to run their robots properly and don’t understand the sheer number of humans wanting to make home copies of our site(s). Thousands of humans per day set up an offline downloader and then just assume their software will work properly. Instead, those badly-coded bots usually run out of control and hit servers at 5-50 times a second. Multiply that times a thousand.
And then there are researchers who want to analyze the data on our site and disregard our Terms of Use and copyrights. They are not tech experts and they often run robot spiders that they don’t understand and often hit our site ten times too fast and ten times too much. Erowid has a little over 100 thousand unique pages but often these “research” bots will download a million or five million pages because they make errors with parameters that generate “unique” URLs.
Le Sigh.
New Server Robot Attack Threat: Garbage LLM-AI
But there’s another “new”-ish group of robots that are part of the “Artificial Intelligence” (AI) swarm that is so popular in public media. and part of a self-recursive funding crime machinery that seems to get more adopters every day. I [earth], don’t understand the absurd focus on new large language model (LLM) AI. We used to just call this shit “bad software”. WTF. I can’t listen to the radio for five minutes without hearing the term “AI” as if the whole house of cards isn’t built on theft and absurd energy use being used against the general population and hurting every content producer who has less than a billion dollars. It’s a giant grift built on a stupid scam.
I grew up in the 1970s and our family were very early adopters of computers. I learned to type before I learned to write in cursive. I learned to solder green board, caps, and transistors before I knew how to break into online servers. But I was childishly black-hat hacking by age 8. Minor stuff: getting accounts on restricted systems so we could play games (“Adventure” par example). I stopped that type of thing when one of my older compatriots got raided by the FBI and spent time in jail. We got some calls to our parents telling them we had broken into a few systems and we stopped. But I digress.
We had crappy computers but we had great terminals (“Glass Terminals”, “Glass Boxes”) : machines with no computing ability except they had modems that could connect to remote servers. In our neighborhoods and schools, there were various computers here and there, but very few computers had network connections and almost none had standardized data media. What I’m saying is that each computer was alone, isolated, and you couldn’t reliably add software to it. There were several companies that had home computers in the late 1970s that had tape drives and a few that had 5-inch floppy disk drives, but most home computers had no standard read-write media at all. It was insane. That changed around 1979 to 1982, to digress further.
So, as a computer kid who understood parlor tricks, the pillars of magic, typing, and computers, I memorized many programs in BASIC, which was an almost ubiquitous programming language across most home or school platforms. My favorites were Animals and Twenty Questions, but one of the 60-ish line programs I memorized was ELIZA. It was a trivially simple prompt that pretended to be a human therapist on the other end of an invisible network connection. I would usually type the program in and fiddle with people’s computers and then say, “Oh, I got it connected to the net and logged in to a psychotherapist, sit down and ask it questions.”
It was ridiculously simple. If I was stuck in the house for a long time with nothing else to do, I could customize it and add a few hundred lines to make it way more complex. If I knew the target rube, I would add in a few extra lines to target them. After a few dozen trials, I knew the most common questions people would ask. It was easy to add extra “smarts” to ELIZA. But even using the non-customized BASIC code, far over 50% of the people who sat down and chatted with ELIZA initially thought it was a human on the other end, despite their computers not having any networking capabilities and the very thin depth of ELIZA. Most people would figure out it was crappy software after 10+ questions, but the initial effect was exactly what I wanted. Many owners and schools with early home computers were totally fooled by the trivial set of ELIZA rules for software and a text prompt that would simulate a caring, interested therapist. So now, what does my tricking people with ELIZA in 1977 have to do with today?
The grifter shadow box magic trick that is currently called Artificial Intelligence. Why is AI relevant to the blizzard of robot wars? Because the companies making these “AI” systems rent or sell the software to almost anyone and they can and are used to “ingest” data from websites.
2025 LLM “Artificial Intelligence” Monsters
So here we are in September 2025 and the majority of bad actor spider robots attacking erowid appear to be LLM monsters. Why did I tell the story about memorizing ELIZA? Because the dumb, grifter, magic trick bullshit that was ELIZA is exactly the same as the current “AI” garbage. The AI bots have stolen all of the content online, they ignored rules of use, copyrights, and our technical robot-to-robot rule settings. And they are so badly written that their authors can’t keep them from encouraging people to commit suicide. And that’s when the AI is run by the legitimate companies.
To re-iterate: the crappy AI and their theft machines are the responsible LLM AI bots, where some major company’s name is attached to their horror show. But now major companies give or license out their horrible software to anyone for any purpose. They do not get better when they are directly controlled by anonymous people with no repercussions possible other than maybe having one of their virtual servers disabled. But getting that to happen is very, very difficult for someone being violated by their malicious bots.
And they aren’t just making worse ELIZAs, one “new” purpose is to attack websites like ours. Hilarious.
For the last six weeks, we have had tens of thousands of IP addresses attacking our sites with fake “User Agent” info pulling info into LLM “AI” models. We can tell by what URLs they try and what url parameters they add on whether they are probing for security holes.
So this “new” addition to the blizzard of robot wars just outside people’s perception is LLM-type “learning” Artificial Intelligence robots that are programmed to attack and penetrate sites while also stealing the data and incorporating the good and bad data into their many, many databases controlled by nobody-knows-who.
I have not seen this specific issue covered in any other media, so I thought it was worth writing about. If you’re a unix nerd (which you likely are not) these robots have been forcing the “load” on our main public facing servers up over 50 with a near constant 5-20 load. We try to keep our load under 1 (one) as a rule. 50 load is very very bad. In UNIX server terminology, the load is the number of requests waiting to run. Our main machine has 16 CPUs each with a large amount of memory and each CPU can handle a process and most processes are completed in a hundredth of a second or so.
So, a new front in the robot war is LLM “AI” software that’s been re-coded to run as penetration testing, denial of service attacks, and just out of control machinery no one is paying any attention to except on our end where they have made it impossible, for instance, to serve our book review section because it’s been targeted for penetration.
Their hosting companies love these bad actors because they pay for their energy and network use. Amazon AWS appears to not care at all that their servers are being used as the largest digital weapon platform ever seen. Great job, Jeff Bezos.
If I enable book reviews, we get 500 hits per second, all with improper penetration parameters. It triggers red flags with our server farms and it causes a crazy weird set of problems that are probably better in a separate whining post. I will whine further in a separate article.
With love and respect for our human users: I apologize that I’ve had to block tens of thousands of IPs in the last month and shut down parts of the site, and stop and restart our web server multiple times per day for weeks.
Our main server had literally been online (“uptime”) for ten years before we had to move it in April 2024 to a new server farm (in the same ISP we’ve been using for 25 years). We try for 99.9% service uptime, but we can’t currently meet that target because of the:
Robot Wars Just Outside Most People’s Perception
My best to y’all and those you care about,
earth
Technical Director and Co-founder, Erowid Center
Erowid.org | DrugsData.org
