python simplicity

Ok, needed to return only the numerical portion from a string like ‘h00009492a802.ne.client2.attbi.com’. Pretty simple to do in Python:

def onlyNum(s):
    num = [n for n in s if n.isdigit()]
    return ''.join(num)

Voila. And the result?

>>> s = 'h00009492a802.ne.client2.attbi.com'
>>> onlyNum(s)
'000094928022'

The idea was to check the amount of numerical digits in a sending machines domain name in a mail header to help determine if it is SPAM. Since mostly spam originates from addresses like the one above, to me it would seem reasonable to say:

nums_from_header = onlyNum(header_hostname)
if len(nums_from_header) > 4:
    SPAM = True
else:
    SPAM = False

It’s not perfect (and I wound up not using it in favor of a DNS blacklist), but Python makes it pretty easy to conceptualize.

(I’m glad that I can evangelize Python to all of you who come here to read stuff about Curtis. I can just see you all shaking your heads at this sort of thing. Makes me smile :-)

cw

3 Responses to “python simplicity”

  1. ashby Says:

    Curtis, Python, Indians. Whatever.

    But while we’re here:
    “The idea was to check the amount of numerical digits in a sending machines domain name in a mail header to help determine if it is SPAM”
    How does that help? Is it just an extension of the idea that a spam server is named in such a manner?

    Also, what is “DNS blacklist?”

  2. christian Says:

    “Is it just an extension of the idea that a spam server is named in such a manner?”

    You’ve got it. Most legit email does not originate from hosts with lots of digits in the hostname. Conversely, most DSL/cable/dialup hosts have a lot of digits in their hostnames. It was my thought that by filtering email out from hosts with lots of digits in their host names would curb some spam.

    A DNS blacklist is a list that is contains the host names of known spammers, mail servers that have been compromised and used as spam relays, and (some lists) dynamically assigned addresses. You use a list like this that is maintained by a 3rd party and configure your mail server to reject mail from any host on the list.

  3. Jeffery C Says:

    Nerd! Nerd! Nerd! Christian, this might be the all time best blog I’ve ever read. The only thing more nerdish would be if I started raving about the brown thrasher I saw today.

Leave a Reply