The other night I shared a Google doc with a friend of mine over Facebook chat. Seconds after sending the link, I noticed an anonymous viewer had entered the document:
Surprised my friend had opened the doc so quickly, I navigated back to Facebook. To my bemusement, not only was the friend I was messaging away, I also hadn't even sent the link; I pasted it into the chat window but forgot to hit enter.
There was no way my friend had opened that document, but someone was there. My mind immediately began to conjure images of a dark surveillance room over at the Facebook HQ where government agents intercepted and spied on all communication occurring over their network. Finally proof that Facebook was a front for the CIA!
As the conspiratorial side of me settled down I realized the anonymous user visiting my document was not someone but something. A Facebook bot! The mystery was solved, but I was already intrigued and wanted to dig a bit deeper. I threw a tiny 12 line flask app together to investigate further:
This prints any visiting user agents to the terminal. I ran the app on one of my publicly accessible ports, pasted the its url into a Facebook chat, and anxiously watched my logs:
127.0.0.1 - - [01/Jan/2013 15:00:43] "GET / HTTP/1.1" 200 - facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
The provided notice made it all clear. Facebook crawls pasted links to retrieve and display any available image previews for the linked website. Not as exciting a realization as would have been uncovering a covert surveillance operation, but still interesting.
So the take away is, If you're ever feeling lonely and would like a visit from a web crawler, just paste a link into a Facebook text box. Use a Google doc to feel the crawlers cold robotic presence :)