I Was the Spam All Along
Published | Last Updated
This one is just going to be a short follow up on A Quick Business Discussion. You might be wondering if I ended up running the model in production. I did.
Initially, I started running the filter every 15 minutes, for any message sent in the last 7 days. If you’re familiar with low-bandwidth, low-cost IMAP servers, you might already see where this is about to go wrong.
We got an e-mail that was immediately classifed as spam…
Immediate Action Required: Suspicious Download Activity Detected on Your Titan Account
I am writing to address a matter concerning an unusual surge in downloads amounting to 58.3 GB in last 7 days originating from ****@j****e.com Titan account setup on a third-party email client. Unfortunately, this activity is categorized as suspicious behavior and signifies an unfair utilization of our account resources.
I followed up and gave the tech team a little more information. It wasn’t a scam or a botnet, it was just my naive filter re-downloading what must have been a very large file sent in the last week. They replied with the following:
Yes, continuously scanning emails from the last week every 15 minutes would result in a high volume of downloads, which aligns with the activity we observed.
Limiting the scope (e.g., only unread emails from the last 24 hours) would be a more reasonable approach.
First off, the Titan opsec teams are so nice. I started my career in a place where the default response to something this dumb would not be so nice. So kudos to Titan.
Second off, the fix is so darn simple that all you had to do was check the log file (that was 100s of MB long when printing every e-mail subject) to understand what was going on.
Actually, this reminds of a similar story involving Tibco Rendezvous multicasts.
A brief detour on DOS-ing systems
I might be butchering the story, but I heard it almost 10 years ago so cut me some slack.
A friend needed to ack messages recieved over a Tibco RendezVouz (TibRV) multicast because something wasn’t working. The ack was very simple: if recieve message, send “yo yo yo, this is ______”. The modern day equivalent of this is testing a full-duplex websocket by sending keep-alive messages.Should work. The problem is that said acking code was also listening to the same channel, so every recieved ack created another one in an infinite loop of UDP packets. UDP on the corporate intranet sure is speedy. So this causes massive bottleneck on the multicast channels and inevitably brings down multiple hosts. When the production support engineers go to check the logs to figure out what happened, they find “yo yo yo, this is ______”.[1]
Back to E-mail
As I said, the fix was so darn simple. The following IMAP query:
since = dt.date.today()
“SEARCH”, f’(SENTSINCE {since.strftime("%d-%b-%Y")})'
Needed to become a slightly more restrictive one:
since = dt.date.today() - rdate(days=7)
“SEARCH”, f’(UNSEEN SENTSINCE {since.strftime("%d-%b-%Y")})'
And voila, running every 15 minutes has stopped causing an issue! Until the next time that maybe we get an insanely large e-mail
footnotes
[1] - I promise this really wasn’t me, but also it’s a mentor that I respect a lot. Real engineering starts and ends with print statements so often and it’s fun to know that your mentors are human too
#Classification #Email #Machine Learning #Spam #Statistics #Supervised Models
comments powered by Disqus