The confusion of the different behaviours of spam filters
October 10th, 2024 at 02:31am
In recent months I have received occasional correspondence informing me that someone has registered on this blog but not received the automated email confirming the registration and allowing them to set a password. I had thought it was a bug which had crept in to the WordPress installation over numerous upgrades over the course of nearly two decades, but it seemed strange that I was still receiving emails from the blog without issue informing me of new comments and user registrations. I wasn’t quite able to put my finger on what was going on.
The other day I received another one of those messages and was pleased to receive it within a few hours of the person registering as it meant all of the relevant logs would still be fresh, so I set about investigating. The WordPress installation doesn’t keep logs of the emails it sends (although it might generate an error which would be logged by the webserver if it encountered an error while trying to send an email) but the server itself does keep logs of email sending activity, so I had a look there.
I could see from this log that as far as the webserver’s internal email server was concerned, the email to me advising of a new registration as well as the email to the new user were both generated and sent. This immediately ruled out WordPress as the problem as it had clearly generated and sent both emails. The problem had to be further down the line.
Now, I should explain that email for the samuelgordonstewart.com is not hosted on the same server as the website, however the server for the website has an email server so that it can send emails. This server can also be used to receive emails but for me at least, it is not used for receiving. Like many website, this site is hosted on a shared server containing many completely unrelated websites. Each of those websites could generate and send emails, and for the hosting company there is always a risk that an insecure script on someone’s website could be exploited and be used to send out spam, which would have the impact of putting an unnecessary strain on the server’s resources as well as potentially getting the server blacklisted by a bunch of spam filtering services, affecting all of the websites on the server, not just the website generating the spam. To mitigate this risk, my webhost, VentraIP, employs an outbound spam filter. Emails from this server and many other servers in their fleet are funnelled through the outbound spam filtering before being sent on to wherever they’re intended to go. This outbound filtering isn’t particularly vigorous, but just enough to avoid having one of their servers send out copious amounts of obvious spam.
Unfortunately this makes the server’s log’s indication that the email was accepted by the receiving server to not mean much, as all it is really saying is that the outbound spam filtering server accepted it. Beyond that, what happened to the email can’t be determined from this log.
At this point I could have asked my webhost to check the spam filter logs to see what happened and see if Gmail’s servers accepted the email, and while that might have provided some information, it probably wouldn’t have told me much, and there was more I could investigate first. There were two clues in the logs I already had. Firstly, the receiving mail server “out.smarthost.mxs.au” was not one I was familiar with, and secondly the ultimate destination was supposed to be Gmail which has some fairly strict sender verification checking as part of its spam filtering.
One of the first lines of defence against spam is a domain name’s SPF record. The main purpose of this record is to determine which servers are allowed to send email on behalf of the domain. A few months back I made a change to one character in the SPF record of samuelgordonstewart.com. At the end of the record I changed
~all
to
-all
This had the effect of changing the policy of the SPF record from “servers which aren’t explicitly allowed to send email for this domain might still be OK to send such emails” to “only email from servers which are explicitly allowed to send emails for this domain should be accepted, everything else should be rejected”.
I changed this rule at the time because (1) I should have done so a long time ago, and (2) I had noticed I was receiving spam allegedly from my domain but which had clearly come from servers with no connection to my domain whatsoever and I wanted to stop this from happening.
Going back to the logs, the fact I didn’t recognise “out.smarthost.mxs.au” as a server which had been doing filtering for my webhost made me wonder if it was not present in my domain’s SPF record and emails going through it might have been getting rejected by Gmail.
To cut a long story short on this, the answer was yes. At some stage my webhost had changed how they organised their outbound filtering, and my SPF record had become outdated as a result. The DNS records which host the SPF record are in fact hosted by my webhost, so in theory they could have updated this automatically and for many of their clients they almost certainly did, however as I had made a number of custom changes to my DNS records including my SPF record over the years, it was probably beyond the scope of their automated systems to make this change for me. In fact, the way my SPF record was configured, their automated system could have drawn the inference that I didn’t want their outbound filtering to be allowed to handle mail from my domain and thus adding such a record would have been inappropriate.
My SPF record was
v=spf1 ip4:103.42.110.11 +a +mx +include:spf.hostedmail.net.au +include:spf.messagingengine.com -all
Effectively what this said was that I was permitting mail to be sent by the server at 103.42.110.11 (the IP address of the server hosting the website), any server listed in the domain’s A records (this rule basically duplicates the first rule but allows the IP address of the server to be changed without me having to manually add the new IP address in), any server listed in the MX records (the servers which receive email for the domain) plus any servers specified by the records of spf.hostedmail.net.au and spf.messagingengine.com.
spf.hostedmail.net.au had previously included the outbound filtering of my webhost. This record belongs to my webhost’s separate email hosting service which I used to use. I believe it shared outbound filtering with their webservers, but apparently doesn’t any more.
spf.messagingengine.com belongs to Fastmail which is my current email host.
When I checked the the SPF record of another domain I have hosted by VentraIP I noticed it contained a different server: spf.hostingplatform.net.au, which is indeed the record for my webhost’s outbound spam filtering.
So I adjusted my SPF record to include this:
(I can probably remove the spf.hostedmail.net.au as it is no longer needed, but one change at a time…)
Then I registered a new account on this blog using the email address of a Gmail account I have access to. I don’t have a personal account at Gmail and haven’t for a very long time…in fact I probably wouldn’t have any account with Google at all if it wasn’t for the fact I have to have an account with them for YouTube. Email contains an awful lot of sensitive information about a person and I’d rather pay to have my email hosted somewhere where I can be confident it’s not getting scanned for advertising targeting or profiling purposes. Anyway, the registration email went through…it landed in the spam folder and Gmail noted the email looked very similar to emails it had previously rejected, but at least it got delivered and wasn’t silently blocked. I was then able to mark it as “not spam” to help train their filters and hopefully with time Google will start to recognise that emails from my blog are legitimate again.
What’s interesting about all of this is that various email services and spam filters have differing ways of handling spam and interpreting things. In this instance, I was receiving emails from my blog at Fastmail without any issue but Gmail was blocking them completely. So it seems that Fastmail and Gmail have different ways of deciding which server is the sender of the email, and although I pay Fastmail for my email service and am quite happy with them, frankly I think Gmail has the correct interpretation here.
Every email you send or receive is basically just a big heap of text. There’s a lot of text you don’t normally see in the “headers” with information about where the email is from and where it has been, and attachments are encoded as text which looks like pages and pages and pages of gibberish.
A portion of the headers of an email sent to me by this blog advising me of a new user registration
The headers contain information about every server which handles the email along the way, including the time the server received the message and where it received it from. Email servers often add other information as well such as any spam filtering checks they did, or in the case of an email server on a webserver, which account on the webserver generated the email. Ultimately this is just text and there’s no way for a mail server further down the chain to verify any of the information added at an earlier stage. The only information which a mail server can be sure of is the address of the the server or device which it received the message from, and any information the server adds itself.
Fastmail seems to be accepting that email might get routed via another server but as long as the headers list an authorised server as the originating source of the message, the email should be let through. Whereas Gmail is much more strict and will reject an email if the server it receives the message from isn’t an authorised server for that domain, regardless of what is listed in the headers.
Given it is impossible to verify details listed in the headers by previous servers in the chain, it is possible to fake a portion of the headers of an email, and a sufficiently sophisticated spam operation would be wise to do just that in order to make it look like the ultimate source of the email is authorised. In fact I have no doubt some spam operations do just that.
SPF isn’t the be-all-and-end-all of spam filtering by a long way, but it’s an important first step, and while I know Fastmail is used to receiving email from my webserver and knows it’s not spam, the fact that it seems to let perceived reputation and unverifiable header text cloud the judgment of its spam filtering is a concern. I can see merit in sending such emails to the spam folder rather than Gmail’s policy of flat out silent rejection and deletion, and if Fastmail had been doing that then I would have picked up on the issue with the SPF record not listing the correct outbound filtering servers sooner as the headers inserted by Fastmail’s spam filters would have provided that information, but ultimately I think Gmail’s policy of treating the server which sent it the message as the sender to be checked against SPF is the correct methodology, even if I think some of those emails could be put in the spam folder rather than being silently deleted.
Fastmail’s spam filtering is not proprietary to them. Some aspects of it might be but it is built on systems widely used elsewhere for spam filtering, so one has to wonder how many of the spam filters in use by email servers right around the world have an overly permissive approach to SPF records and are willing to take the word of header text which may be completely illegitimate with no way of being checked. Too many, I fear.
Samuel