Servers, Challenges, and Privacy
December 2013
Originally published in Erowid Extracts #25
Citation: Erowid. "Servers, Challenges, and Privacy". Erowid Extracts. December 2013;25:24. Online edition: Erowid.org/general/about/about_article15.shtml
One of the important ongoing tasks at Erowid is to maintain the servers that make our online resources available to almost 20 million unique visitors each year.
Erowid currently operates three dedicated physical servers and three virtual servers, which monitor or proxy the primary servers. Our two staff and three volunteer sysadmins keep the server software and operating systems up to date, implement security fixes, replace hardware as necessary, protect the privacy of visitors, and make sure our websites are responsive.
At 6 am on November 3rd, the Erowid.org web server began responding slowly. Our normal 150 millisecond response time degraded to an average of 6.5 seconds, more than 40 times slower than it should be.
After a few days, we realized that the problem was persisting. Beginning at 6 am and ending at 8 pm every day (west coast time), the main server was responding slowly. Our best guess was that a new site indexer (such as Google or Bing) or site archiver was causing the slowdown by overloading our server. But, after extensive review of our logs and watching site traffic, we couldn't find an obvious culprit. The number of visitors wasn't significantly higher than usual, the number of file hits hadn't jumped, and the types of traf fic hadn't changed.
JL, our lead sysadmin, was eventually able to solve the problem and bring the server response time back down to the 150 millisecond average by reducing the number and duration of "keep alive" requests (active connections) that we would accept. For reasons that remain a mystery, on November 3rd, some browsers began holding open connections for much longer than they had previously.
Individual browsers were holding open request tunnels that eventually filled up the maximum number of concurrent connections the server could handle. Additional requests then had to wait for an open connection to close.
Though systems administration is usually done behind the scenes, only drawing attention when things stop working, it is an extremely important task that underlies the success of an online project like Erowid.org.
As we enter 2014, we again have to reassess our priorities: how we strike a balance between privacy, security, and systems costs. BurningMan.com, with which we have shared a server rack for 13 years, might be moving their main operations into "the cloud". This could result in the dissolution of the group of organizations with which we share bandwidth. Moving services into cloud systems like Amazon's AWS can reduce maintenance costs substantially, but it also means less privacy for visitors and less control over our data, including email and visitor submissions. We will continue weighing our options and the costs of continuing to own and operate dedicated Erowid server hardware.
Erowid currently operates three dedicated physical servers and three virtual servers, which monitor or proxy the primary servers. Our two staff and three volunteer sysadmins keep the server software and operating systems up to date, implement security fixes, replace hardware as necessary, protect the privacy of visitors, and make sure our websites are responsive.
At 6 am on November 3rd, the Erowid.org web server began responding slowly. Our normal 150 millisecond response time degraded to an average of 6.5 seconds, more than 40 times slower than it should be.
After a few days, we realized that the problem was persisting. Beginning at 6 am and ending at 8 pm every day (west coast time), the main server was responding slowly. Our best guess was that a new site indexer (such as Google or Bing) or site archiver was causing the slowdown by overloading our server. But, after extensive review of our logs and watching site traffic, we couldn't find an obvious culprit. The number of visitors wasn't significantly higher than usual, the number of file hits hadn't jumped, and the types of traf fic hadn't changed.
JL, our lead sysadmin, was eventually able to solve the problem and bring the server response time back down to the 150 millisecond average by reducing the number and duration of "keep alive" requests (active connections) that we would accept. For reasons that remain a mystery, on November 3rd, some browsers began holding open connections for much longer than they had previously.
Individual browsers were holding open request tunnels that eventually filled up the maximum number of concurrent connections the server could handle. Additional requests then had to wait for an open connection to close.
Though systems administration is usually done behind the scenes, only drawing attention when things stop working, it is an extremely important task that underlies the success of an online project like Erowid.org.
As we enter 2014, we again have to reassess our priorities: how we strike a balance between privacy, security, and systems costs. BurningMan.com, with which we have shared a server rack for 13 years, might be moving their main operations into "the cloud". This could result in the dissolution of the group of organizations with which we share bandwidth. Moving services into cloud systems like Amazon's AWS can reduce maintenance costs substantially, but it also means less privacy for visitors and less control over our data, including email and visitor submissions. We will continue weighing our options and the costs of continuing to own and operate dedicated Erowid server hardware.