Yesterday, a sysadmin noticed a server that runs Tomcat for us was filling up with log files, and I started digging around in the Tomcat docs to figure out how to set up log file rotation—I eventually gave up and decided to delete old log files after a week, because this is a development server we’re talking about, anyway, that’s not the story. The story is, while skimming the Tomcat documentation, I spied the section on the Crawler Session Manager Valve . For years, we’ve had random out of memory errors when someone tries to run a big job on our application… and we’ve always traced the cause to the server also being hit by a web crawler at the same time. Crawlers can spawn many sessions during their web site crawl. Amazingly, the Tomcat developers have figured out a clever solution to this dilemma, by creating a single session for a bot and even if the IP address changes, the deal is, one session per bot. It’s a great idea. And the Valve is just a single line that gets dropped into your server.xml file, most likely in the Engine block, but it can go elsewhere if you want. Or, you know, two lines if you want to explain it with a comment, like so:
<!-- Crawler Session Manager Valve helps mitigate damage done by web crawlers -->
<Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve" />
I don’t really understand why I had to blunder across this amazing thing in a skim of the documentation. People should be shouting this from the rooftops! You don’t have to wait for some random web crawler to kick over your website. You don’t.