{"id":1665,"date":"2025-09-29T08:00:00","date_gmt":"2025-09-29T08:00:00","guid":{"rendered":"https:\/\/www.erowid.org\/columns\/crew\/?p=1665"},"modified":"2025-10-13T12:21:51","modified_gmt":"2025-10-13T12:21:51","slug":"robot-wars-just-outside-most-peoples-perception-bad-software-llm-edition","status":"publish","type":"post","link":"https:\/\/www.erowid.org\/columns\/crew\/2025\/09\/robot-wars-just-outside-most-peoples-perception-bad-software-llm-edition\/","title":{"rendered":"Robot Wars Just Outside Most People&#8217;s Perception: Bad Software LLM Edition"},"content":{"rendered":"<p>Around 2004, Erowid.org hit the human-robot inflection point: the number of robot \/ bot \/ script &#8220;hits&#8221; to our websites exceeded the number of human visitor hits. By 2014, the numbers were more like 10x robot to human. In 2025, it\u2019s 1000x robots to humans.<\/p>\n<p>There are a lot of complications in describing this. Many of the \u201cgood actors\u201d are search engines. Google scrapes our entire site every day, as does Bing. Yahoo bot every couple days. They are not actually good robot actors anymore, they steal content from sites they scrape, such as erowid, then publish it on their own websites without giving us credit or a link back. But those are the \u201cgood\u201d robots. They are horrible mega corp thieves, but at least they don\u2019t disable our servers.<\/p>\n<p>There is also the ridiculous joke that the current LLMs can be trusted to serve up medical information to the public that they&#8217;ve \u201csynthesized\u201d from the web. So even the \u201cbest\u201d LLM robot scrapers are doing the public a disservice, damaging our services, and making original-content sites like erowid (and many others) less relevant.<\/p>\n<p>There are other good bot spiders who limit their rates and don\u2019t create insane parameter abuse. Parameters are things added to a URL after a question mark or a slash that can send additional data to the server or request extra actions or responses. On erowid.org, our Experience Vaults are mostly parameter driven. The difference between <em>https:\/\/erowid.org\/experience\/exp.cgi<\/em> and <em>https:\/\/erowid.org\/experience\/exp.cgi?Y1=2025<\/em> is that the the second url adds a name-value pair called a parameter.<\/p>\n<p>Many web servers, such as our Apache (praise Holey Apache) also accept parameters through slashes. Erowid handles slash params in some of our software, but not most. It gets pretty technical and I\u2019m not trying to bore those who don\u2019t write server code or those who do.<\/p>\n<p>For instance, humans and googlebot hit donations.php. Bad bots try donations.php with ten thousand different question mark parameters to try to see if the page is a WordPress entry or some other known piece of software that might have a published bug. Or insert \u201cXSS\u201d (cross site scripting) to see if they can create a URL they could spam out where it would appear to be a valid erowid URL, but would do something malicious. They are trying to find security holes to exploit to break into our site. Or they are just trying to overwhelm our server with useless hits. We have spent decades hardening erowid against such attacks, but <strong>Real Security Is Real Hard<\/strong>. There is no end to fighting against attack robots.<\/p>\n<p>In the era of \u201ccloud computing\u201d, where many people can rent virtual server space that goes almost unmonitored by those who rent it out, people who used to badly run archiving spiders can now inflict an army of bad spiders out at the same time.<\/p>\n<p>I spend hours every day on our \u201cshields\u201d, but the attacks are absurd.<\/p>\n<p>Good robot-spiders behave nicely. They honor robots.txt exclusion files and they report who they are and what they\u2019re doing accurately. Bad robots (now 99.9% of robots) lie about their identity and their purpose. And Amazon and Google and other virtual-server hosts now allow evil robots to run on their platforms that lie that they are amazonbot, googlebot, bingbot, etc etc etc.<\/p>\n<p>Another common type are home human users who want to make a copy of Erowid.org but don\u2019t understand how to run their robots properly and don\u2019t understand the sheer number of humans wanting to make home copies of our site(s). Thousands of humans per day set up an offline downloader and then just assume their software will work properly. Instead, those badly-coded bots usually run out of control and hit servers at 5-50 times a second. Multiply that times a thousand.<\/p>\n<p>And then there are researchers who want to analyze the data on our site and disregard our Terms of Use and copyrights. They are not tech experts and they often run robot spiders that they don\u2019t understand and often hit our site ten times too fast and ten times too much. Erowid has a little over 100 thousand unique pages but often these \u201cresearch\u201d bots will download a million or five million pages because they make errors with parameters that generate \u201cunique\u201d URLs.<\/p>\n<p style=\"text-align: left;\">Le Sigh.<\/p>\n<h2 class=\"wp-block-heading has-medium-font-size\"><strong>New Server Robot Attack Threat: Garbage LLM-AI<\/strong><\/h2>\n<p>But there\u2019s another \u201cnew\u201d-ish group of robots that are part of the \u201cArtificial Intelligence\u201d (AI) swarm that is so popular in public media. and part of a self-recursive funding crime machinery that seems to get more adopters every day. I [earth], don\u2019t understand the absurd focus on new large language model (LLM) AI. We used to just call this shit \u201cbad software\u201d. WTF.\u00a0 I can\u2019t listen to the radio for five minutes without hearing the term \u201cAI\u201d as if the whole house of cards isn\u2019t built on theft and absurd energy use being used against the general population and hurting every content producer who has less than a billion dollars. It\u2019s a giant grift built on a stupid scam.<\/p>\n<p>I grew up in the 1970s and our family were very early adopters of computers. I learned to type before I learned to write in cursive. I learned to solder green board, caps, and transistors before I knew how to break into online servers. But I was childishly black-hat hacking by age 8. Minor stuff: getting accounts on restricted systems so we could play games (&#8220;Adventure&#8221; par example). I stopped that type of thing when one of my older compatriots got raided by the FBI and spent time in jail. We got some calls to our parents telling them we had broken into a few systems and we stopped. But I digress.<\/p>\n<p>We had crappy computers but we had great terminals (\u201cGlass Terminals\u201d, \u201cGlass Boxes\u201d) : machines with no computing ability except they had modems that could connect to remote servers. In our neighborhoods and schools, there were various computers here and there, but very few computers had network connections and almost none had standardized data media. What I\u2019m saying is that each computer was alone, isolated, and you couldn\u2019t reliably add software to it. There were several companies that had home computers in the late 1970s that had tape drives and a few that had 5-inch floppy disk drives, but most home computers had no standard read-write media at all. It was insane. That changed around 1979 to 1982, to digress further.<\/p>\n<p>So, as a computer kid who understood parlor tricks, the pillars of magic, typing, and computers, I memorized many programs in BASIC, which was an almost ubiquitous programming language across most home or school platforms. My favorites were Animals and Twenty Questions, but one of the 60-ish line programs I memorized was ELIZA. It was a trivially simple prompt that pretended to be a human therapist on the other end of an invisible network connection. I would usually type the program in and fiddle with people\u2019s computers and then say, \u201cOh, I got it connected to the net and logged in to a psychotherapist, sit down and ask it questions.\u201d<\/p>\n<p>It was ridiculously simple. If I was stuck in the house for a long time with nothing else to do, I could customize it and add a few hundred lines to make it way more complex. If I knew the target rube, I would add in a few extra lines to target them. After a few dozen trials, I knew the most common questions people would ask. It was easy to add extra \u201csmarts\u201d to ELIZA. But even using the non-customized BASIC code, far over 50% of the people who sat down and chatted with ELIZA initially thought it was a human on the other end, despite their computers not having any networking capabilities and the very thin depth of ELIZA. Most people would figure out it was crappy software after 10+ questions, but the initial effect was exactly what I wanted. Many owners and schools with early home computers were totally fooled by the trivial set of ELIZA rules for software and a text prompt that would simulate a caring, interested therapist.\u00a0 So now, what does my tricking people with ELIZA in 1977 have to do with today?<\/p>\n<p>The grifter shadow box magic trick that is currently called Artificial Intelligence. Why is AI relevant to the blizzard of robot wars? Because the companies making these \u201cAI\u201d systems rent or sell the software to almost anyone and they can and are used to \u201cingest\u201d data from websites.<\/p>\n<h3>2025 LLM \u201cArtificial Intelligence\u201d Monsters<\/h3>\n<p>So here we are in September 2025 and the majority of bad actor spider robots attacking erowid appear to be LLM monsters. Why did I tell the story about memorizing ELIZA? Because the dumb, grifter, magic trick bullshit that was ELIZA is exactly the same as the current \u201cAI\u201d garbage. The AI bots have stolen all of the content online, they ignored rules of use, copyrights, and our technical robot-to-robot rule settings. And they are so badly written that their authors can\u2019t keep them from <a href=\"https:\/\/www.cbsnews.com\/news\/ai-chatbots-teens-suicide-parents-testify-congress\/\">encouraging people to commit suicide<\/a>. And that\u2019s when the AI is run by the legitimate companies.<\/p>\n<p>To re-iterate: the crappy AI and their theft machines are the responsible LLM AI bots, where some major company\u2019s name is attached to their horror show.\u00a0 But now major companies give or license out their horrible software to anyone for any purpose. They do not get better when they are directly controlled by anonymous people with no repercussions possible other than maybe having one of their virtual servers disabled. But getting that to happen is very, very difficult for someone being violated by their malicious bots.<\/p>\n<p>And they aren\u2019t just making worse ELIZAs, one \u201cnew\u201d purpose is to attack websites like ours.\u00a0 Hilarious.<\/p>\n<p>For the last six weeks, we have had tens of thousands of IP addresses attacking our sites with fake \u201cUser Agent\u201d info pulling info into LLM \u201cAI\u201d models. We can tell by what URLs they try and what url parameters they add on whether they are probing for security holes.<\/p>\n<p>So this \u201cnew\u201d addition to the blizzard of robot wars just outside people\u2019s perception is LLM-type \u201clearning\u201d Artificial Intelligence robots that are programmed to attack and penetrate sites while also stealing the data and incorporating the good and bad data into their many, many databases controlled by nobody-knows-who.<\/p>\n<p>I have not seen this specific issue covered in any other media, so I thought it was worth writing about. If you\u2019re a unix nerd (which you likely are not) these robots have been forcing the \u201cload\u201d on our main public facing servers up over 50 with a near constant 5-20 load. We try to keep our load under 1 (one) as a rule. 50 load is very <strong>very<\/strong> bad. In UNIX server terminology, the load is the number of requests waiting to run. Our main machine has 16 CPUs each with a large amount of memory and each CPU can handle a process and most processes are completed in a hundredth of a second or so.<\/p>\n<p>So, a new front in the robot war is LLM \u201cAI\u201d software that\u2019s been re-coded to run as penetration testing, denial of service attacks, and just out of control machinery no one is paying any attention to except on our end where they have made it impossible, for instance, to serve our book review section because it\u2019s been targeted for penetration.<\/p>\n<p>Their hosting companies love these bad actors because they pay for their energy and network use. Amazon AWS appears to not care at all that their servers are being used as the largest digital weapon platform ever seen. Great job, Jeff Bezos.<\/p>\n<p>If I enable book reviews, we get 500 hits per second, all with improper penetration parameters. It triggers red flags with our server farms and it causes a crazy weird set of problems that are probably better in a separate whining post. I will whine further in a separate article.<\/p>\n<p>With love and respect for our human users: I apologize that I\u2019ve had to block tens of thousands of IPs in the last month and shut down parts of the site, and stop and restart our web server multiple times per day for weeks.<\/p>\n<p>Our main server had literally been online (\u201cuptime\u201d) for ten years before we had to move it in April 2024 to a new server farm (in the same ISP we\u2019ve been using for 25 years). We try for 99.9% service uptime, but we can\u2019t currently meet that target because of the:<\/p>\n<p><strong>Robot Wars Just Outside Most People&#8217;s Perception<\/strong><\/p>\n<p>My best to y\u2019all and those you care about,<\/p>\n<p>earth<\/p>\n<p>Technical Director and Co-founder, Erowid Center<br \/>\nErowid.org | DrugsData.org<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Around 2004, Erowid.org hit the human-robot inflection point: the number of robot \/ bot \/ script &#8220;hits&#8221; to our websites exceeded the number of human visitor hits. By 2014, the numbers were more like 10x robot to human. In 2025, it\u2019s 1000x robots to humans. There are a lot of complications in describing this. Many &hellip; <a href=\"https:\/\/www.erowid.org\/columns\/crew\/2025\/09\/robot-wars-just-outside-most-peoples-perception-bad-software-llm-edition\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Robot Wars Just Outside Most People&#8217;s Perception: Bad Software LLM Edition<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1681,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84,13],"tags":[],"class_list":["post-1665","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-server","category-sysadmin"],"_links":{"self":[{"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/posts\/1665"}],"collection":[{"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/comments?post=1665"}],"version-history":[{"count":16,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/posts\/1665\/revisions"}],"predecessor-version":[{"id":1822,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/posts\/1665\/revisions\/1822"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/media\/1681"}],"wp:attachment":[{"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/media?parent=1665"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/categories?post=1665"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.erowid.org\/columns\/crew\/wp-json\/wp\/v2\/tags?post=1665"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}