SpiderFoot Documentation

About

SpiderFoot is an open source intelligence automation tool. Its goal is to automate the process of gathering intelligence about a given target, which may be an IP address, domain name, hostname or network subnet.

SpiderFoot can be used offensively, i.e. as part of a black-box penetration test to gather information about the target or defensively to identify what information your organisation is freely providing for attackers to use against you.

Pre-Requisites

Linux/BSD/Solaris

SpiderFoot is written in Python (2.7), so to run on Linux/Solaris/FreeBSD/etc. you need Python 2.7 installed, in addition to the lxml, netaddr, M2Crypto, CherryPy, bs4, requests and Mako modules.

To install the dependencies using PIP, run the following:

~$ pip install lxml netaddr M2Crypto cherrypy mako requests bs4

On some distros, instead of M2Crypto, you must install it using APT instead:

~$ apt-get install python-m2crypto

Other modules such as PyPDF2, SOCKS and more are included in the SpiderFoot package, so you don’t need to install them separately.

Windows

SpiderFoot for Windows is a compiled executable file, and so all dependencies are packaged with it.

No third party tools/libraries need to be installed, not even Python.

Up to table of contents

Installing

Installing SpiderFoot is literally as simple as unpacking the distribution tar.gz/zip file.

Linux/BSD/Solaris

To install SpiderFoot on Linux/Solaris/FreeBSD/etc. you only need to un-targz the package, as follows:

~$ tar zxvf spiderfoot-X.X.X-src.tar.gz
~$ cd spiderfoot-X.X.X
~/spiderfoot-X.X.X$

Windows

Unzip the distribution ZIP file to a folder of your choice… yep that’s it.

Up to table of contents

Starting SpiderFoot

Linux/BSD/Solaris

To run SpiderFoot, simply execute sf.py from the directory you extracted SpiderFoot into:

~/spiderfoot-X.X.X$ python ./sf.py

Once executed, a web-server will be started, which by default will listen on 127.0.0.1:5001. You can then use the web-browser of your choice by browsing to http://127.0.0.1:5001. Or, since version 2.10 you can use the CLI, which by default will connect to the server locally, on 127.0.0.1:5001:

~/spiderfoot-X.X.X$ python ./sfcli.py

If you wish to make SpiderFoot accessible from another system, for example running it on a server and controlling it remotely, then you can specify an external IP for SpiderFoot to bind to, or use 0.0.0.0 so that it binds to all addresses, including 127.0.0.1:

~/spiderfoot-X.X.X$ python ./sf.py 0.0.0.0:5001

Then to use the CLI in such a case, from a remote system where the sfcli.py file has been copied to, you would run:

~$ python ./sfcli.py -u http://<remote ip>:5001

Run python ./sfcli.py --help to better understand how to use the CLI.

If port 5001 is used by another application on your system, you can change the port:

~/spiderfoot-X.X.X$ python ./sf.py 127.0.0.1:9999

Windows

SpiderFoot for Windows comes as a pre-packaged executable, with no need to install any dependencies.

For now, there is no installer wizard, so all that’s needed is to unzip the package into a directory (e.g. C:\SpiderFoot) and run sf.exe:

C:\SpiderFoot>sf.exe

As with Linux, you can also specify the IP and port to bind to:

C:\SpiderFoot>sf.exe 0.0.0.0:9999

Caution!

By default, SpiderFoot does not authenticate users connecting to its user-interface or serve over HTTPS, so avoid running it on a server/workstation that can be accessed from untrusted devices, as they will be able to control SpiderFoot remotely and initiate scans from your devices. As of SpiderFoot 2.7, to use authentication and HTTPS, see the Security section below.

Up to table of contents

Security

With version 2.7, SpiderFoot introduced authentication as well as TLS/SSL support. These are automatic based on the presence of specific files.

Authentication

SpiderFoot will require basic digest authentication if a file named passwd exists in the SpiderFoot root directory. The format of the file is simple - just create an entry per account, in the format of:

username:password

For example:

admin:supersecretpassword

Once the file is created, restart SpiderFoot.

TLS/SSL

SpiderFoot will serve HTTPS (and only that) if it detects the existence of a public certificate and key file in SpiderFoot’s root directory. This means whatever port you set SpiderFoot to listen on is the port TLS/SSL will be used. It is not possible for SpiderFoot to serve both HTTP and HTTPS simultaneously on different ports. If you need to do that, an nginx proxy in front of SpiderFoot would be a better solution.

Simply place two files in the SpiderFoot directory - spiderfoot.crt (RSA public key in PEM format) and spiderfoot.key (RSA private key in PEM format). Restart SpiderFoot and you will now be serving HTTPS only.

For instructions on generating a self-signed certificate, check out this StackOverflow article.

Up to table of contents

API Keys

A few SpiderFoot modules require or perform better when API keys are supplied.

Honeypot Checker

  1. Go to http://www.projecthoneypot.org
  2. Sign up (free) and log in
  3. Click Services -> HTTP Blacklist
  4. An API key should be listed
  5. Copy and paste that key into the Settings -> Honeypot Checker section in SpiderFoot

SHODAN

  1. Go to http://www.shodanhq.com
  2. Sign up (free) and log in
  3. Click ‘Developer Center’
  4. On the far right your API key should appear in a box
  5. Copy and paste that key into the Settings -> SHODAN section in SpiderFoot

VirusTotal

  1. Go to http://www.virustotal.com
  2. Sign up (free) and log in
  3. Click your username in the far right and select ‘My API Key’
  4. Copy and paste the key in the grey box into the Settings -> VirusTotal section in SpiderFoot

IBM X-Force Exchange

  1. Go to https://exchange.xforce.ibmcloud.com/new
  2. Create an IBM ID (free) and log in
  3. Go to your account settings
  4. Click API Access
  5. Generate the API key and password (you need both)
  6. Copy and paste the key and password into the Settings -> X-Force section in SpiderFoot

MalwarePatrol

  1. Go to http://www.malwarepatrol.net
  2. Create an account (free) and log in
  3. Click “Open Source” and scroll down to the bottom
  4. Click the “Free” link in the subscription pricing table
  5. Click the free block lists link
  6. You will receive a receipt ID
  7. Copy and paste the receipt ID into the Settings -> MalwarePatrol section in SpiderFoot

BotScout

  1. Go to http://www.botscout.com
  2. Create an account (free) and log in
  3. Under Account Info, your API key will be there
  4. Copy and paste the API key into the Settings -> BotScout section in SpiderFoot

Cymon.io

  1. Go to http://www.cymon.io
  2. Create an account (free) and log in
  3. Under “My API Dashboard”, your API key will be there
  4. Copy and paste the API key into the Settings -> Cymon section in SpiderFoot

Censys.io

  1. Go to http://www.censys.io
  2. Create an account (free) and log in
  3. Click “My Account” (bottom right)
  4. Copy and paste the API Credentials values into the Settings -> Censys section in SpiderFoot

Hunter.io

  1. Go to http://www.hunter.io
  2. Create an account (free) and log in
  3. Click “API” in the top menu-base
  4. Copy and paste the API key into the Settings -> Hunter.io section in SpiderFoot

AlienVault OTX

  1. Go to https://otx.alienvault.com/ and sign up
  2. Log in and click your account on the top right, go to Settings
  3. Scroll down and copy and paste the OTX Key value into the Settings -> AlienVault OTX section in SpiderFoot

Clearbit

  1. Go to https://dashboard.clearbit.com/login and sign up
  2. Log in and click the API link on the left
  3. Copy and paste the “secret” API key into the Settings -> Clearbit section in SpiderFoot

BuiltWith

  1. Go to https://www.builtwith.com and sign up. You get 50 queries for free before having to pay (it’s totally worth it though)
  2. Log in and click on the “Domain API” tab. No other API key type will work with SpiderFoot!
  3. Your API key will appear on the right
  4. Copy and paste it into the Settings -> BuiltWith section in SpiderFoot

FraudGuard

  1. Go to https://fraudguard.io
  2. Register with the plan you choose. The free plan is also available
  3. Click to ‘Create’ an API key, in the form of a username and password
  4. Copy and paste both into the Settings -> Fraudguard section in SpiderFoot

IPinfo.io

  1. Go to https://ipinfo.io
  2. Click on Pricing and select the plan you choose. They offer a very generous free plan with 1,000 queries per day
  3. Click Subscribe, enter your details and follow the registration process
  4. Copy and paste the ‘Access token’ in your Profile to the Settings -> ipinfo.io section in SpiderFoot

Using SpiderFoot

Running a Scan

When you run SpiderFoot for the first time, there is no historical data, so you should be presented with a screen like the following:

To initiate a scan, click on the ‘New Scan’ button in the top menu bar. You will then need to define a name for your scan (these are non-unique) and a target (also non-unique):

You can then define how you would like to run the scan - either by use case (the tab selected by default), by data required or by module.

Module-based scanning is for more advanced users who are familiar with the behavior and data provided by different modules, and want more control over the scan:

Beware though, there is no dependency checking when scanning by module, only for scanning by required data. This means that if you select a module that depends on event types only provided by other modules, but those modules are not selected, you will get no results.

Scan Results

From the moment you click ‘Run Scan’, you will be taken to a screen for monitoring your scan in near real time:

That screen is made up of a graph showing a break down of the data obtained so far plus log messages generated by SpiderFoot and its modules.

The bars of the graph are clickable, taking you to the result table for that particular data type.

Browsing Results

By clicking on the ‘Browse’ button for a scan, you can browse the data by type:

This data is exportable and searchable. Click the Search box to get a pop-up explaining how to perform searches.

By clicking on one of the data types, you will be presented with the actual data:

The fields displayed are explained as follows:

  • Checkbox field: Use this to set/unset fields as false positive. Once at least one is checked, click the orange False Positive button above to set/unset the record.
  • Data Element: The data the module was able to obtain about your target.
  • Source Data Element: The data the module received as the basis for its data colletion. In the example above, the sfp_portscan_tcp module received an event about an open port, and used that to obtain the banner on that port.
  • Source Module: The module that identified this data.
  • Identified: When the data was identified by the module.

You can click the black icons to modify how this data is represented. For instance you can get a unique data representation by clicking the Unique Data View icon:

Setting False Positives

Version 2.6.0 introduced the ability to set data records as false positive. As indicated in the previous section, use the checkbox and the orange button to set/unset records as false positive:

Once you have set records as false positive, you will see an indicator next to those records, and have the ability to filter them from view, as shown below:

NOTE: Records can only be set to false positive once a scan has finished running. This is because setting a record to false positive also results in all child data elements being set to false positive. This obviously cannot be done if the scan is still running and can thus lead to an inconsistent state in the back-end. The UI will prevent you from doing so.

The result of a record being set to false positive, aside from the indicator in the data table view and exports, is that such data will not be shown in the node graphs.

Searching Results

Results can be searched either at the whole scan level, or within individual data types. The scope of the search is determined by the screen you are on at the time.

As indicated by the pop-up box when selecting the search field, you can search as follows:

  • Exact value: Non-wildcard searching for a specific value. For example, search for 404 within the HTTP Status Code section to see all pages that were not found.
  • Pattern matching: Search for simple wildcards to find patterns. For example, search for *:22 within the Open TCP Port section to see all instances of port 22 open.
  • Regular expression searches: Encapsulate your string in ‘/’ to search by regular expression. For example, search for ‘/\d+.\d+.\d+.\d+/’ to find anything looking like an IP address in your scan results.

Managing Scans

When you have some historical scan data accumulated, you can use the list available on the ‘Scans’ section to manage them:

You can filter the scans shown by altering the Filter drop-down selection. Except for the green refresh icon, all icons on the right will all apply to whichever scans you have checked the checkboxes for.

Tor Integration

Refer to this post for more information.

Up to table of contents

Modules

Overview

SpiderFoot has all data collection modularised. When a module discovers a piece of data, that data is transmitted to all other modules that are ‘interested’ in that data type for processing. Those modules will then act on that piece of data to identify new data, and in turn generate new events for other modules which may be interested, and so on.

For example, sfp_dns may identify an IP address associated with your target, notifying all interested modules. One of those interested modules would be the sfp_ir module, which will take that IP address and identify the netblock it is a part of, the BGP ASN and so on.

This might be best illustrated by looking at module code. For example, the sfp_names module looks for TARGET_WEB_CONTENT and EMAILADDR events for identifying human names:

    # What events is this module interested in for input
    # * = be notified about all events.
    def watchedEvents(self):
        return ["TARGET_WEB_CONTENT", "EMAILADDR"]

    # What events this module produces
    # This is to support the end user in selecting modules based on events
    # produced.
    def producedEvents(self):
        return ["HUMAN_NAME"]

Meanwhile, as each event is generated to a module, it is also recorded in the SpiderFoot database for reporting and viewing in the UI.

Module List

The below table is an up-to-date list of all SpiderFoot modules and a short summary of their capabilities.

Module Name Description
sfp_abusech.py abuse.ch Check if a host/domain, IP or netblock is malicious according to abuse.ch.
sfp_accounts.py Accounts Look for possible associated accounts on nearly 200 websites like Ebay, Slashdot, reddit, etc.
sfp_adblock.py AdBlock Check Check if linked pages would be blocked by AdBlock Plus.
sfp_ahmia.py Ahmia Search Tor ‘Ahmia’ search engine for mentions of the target domain.
sfp_alienvault.py AlienVault OTX Obtain information from AlienVault Open Threat Exchange (OTX)
sfp_alienvaultiprep.py AlienVault IP Reputation Check if an IP or netblock is malicious according to the AlienVault IP Reputation database.
sfp_archiveorg.py Archive.org Identifies historic versions of interesting files/pages from the Wayback Machine.
sfp_badipscom.py badips.com Check if a domain or IP is malicious according to badips.com.
sfp_base64.py Base64 Identify Base64-encoded strings in any content and URLs, often revealing interesting hidden information.
sfp_bingsearch.py Bing Some light Bing scraping to identify sub-domains and links.
sfp_bingsharedip.py Bing (Shared IPs) Search Bing for hosts sharing the same IP.
sfp_binstring.py Binary String Extractor Attempt to identify strings in binary content.
sfp_bitcash.py Bitcash.cz Malicious IPs Check if an IP is malicious according to Bitcash.cz Malicious IPs.
sfp_bitcoin.py Bitcoin Finder Identify bitcoin addresses in scraped webpages.
sfp_blockchain.py Blockchain Queries blockchain.info to find the balance of identified bitcoin wallet addresses.
sfp_blocklistde.py blocklist.de Check if a netblock or IP is malicious according to blocklist.de.
sfp_botscout.py BotScout Searches botscout.com’s database of spam-bot IPs and e-mail addresses.
sfp_builtwith.py BuiltWith Query BuiltWith.com’s Domain API for information about your target’s web technology stack, e-mail addresses and more.
sfp_censys.py Censys Obtain information from Censys.io
sfp_clearbit.py Clearbit Check for names, addresses, domains and more based on lookups of e-mail addresses on clearbit.com.
sfp_cookie.py Cookies Extract Cookies from HTTP headers.
sfp_crossref.py Cross-Reference Identify whether other domains are associated (‘Affiliates’) of the target.
sfp_crt.py Certificate Transparency Gather hostnames from historical certificates in crt.sh.
sfp_cybercrimetracker.py cybercrime-tracker.net Check if a host/domain or IP is malicious according to cybercrime-tracker.net.
sfp_cymon.py Cymon Obtain information from Cymon.io
sfp_dnsbrute.py DNS Brute-force Attempts to identify hostnames through brute-forcing common names.
sfp_dnsneighbor.py DNS Look-aside Attempt to reverse-resolve the IP addresses next to your target to see if they are related.
sfp_dnsraw.py DNS Raw Records Retrieves raw DNS records such as MX, TXT and others.
sfp_dnsresolve.py DNS Resolver Resolves Hosts and IP Addresses identified, also extracted from raw content.
sfp_dronebl.py DroneBL Query the DroneBL database for open relays, open proxies, vulnerable servers, etc.
sfp_duckduckgo.py DuckDuckGo Query DuckDuckGo’s API for descriptive information about your target.
sfp_email.py E-Mail Identify e-mail addresses in any obtained data.
sfp_errors.py Errors Identify common error messages in content like SQL errors, etc.
sfp_filemeta.py File Metadata Extracts meta data from documents and images.
sfp_fortinet.py Fortiguard.com Check if an IP is malicious according to Fortiguard.com.
sfp_fraudguard.py Fraudguard Obtain threat information from Fraudguard.io
sfp_freegeoip.py FreeGeoIP Identifies the physical location of IP addresses identified using freegeoip.net.
sfp_github.py Github Identify associated public code repositories on Github.
sfp_googlemaps.py Google Maps Identifies potential physical addresses and latitude/longitude coordinates.
sfp_googlesearch.py Google Search Some light Google scraping to identify sub-domains and links.
sfp_googlesearchdomain.py Google Search, by domain Some light Google scraping to identify sub-domains and links within site
sfp_hackertarget.py HackerTarget.com Search HackerTarget.com for hosts sharing the same IP.
sfp_honeypot.py Honeypot Checker Query the projecthoneypot.org database for entries.
sfp_hosting.py Hosting Providers Find out if any IP addresses identified fall within known 3rd party hosting ranges, e.g. Amazon, Azure, etc.
sfp_hostsfilenet.py hosts-file.net Malicious Hosts Check if a host/domain is malicious according to hosts-file.net Malicious Hosts.
sfp_hunter.py Hunter.io Check for e-mail addresses and names on hunter.io.
sfp_intfiles.py Interesting Files Identifies potential files of interest, e.g. office documents, zip files.
sfp_ipinfo.py IPInfo.io Identifies the physical location of IP addresses identified using ipinfo.io.
sfp_isc.py Internet Storm Center Check if an IP is malicious according to SANS ISC.
sfp_junkfiles.py Junk Files Looks for old/temporary and other similar files.
sfp_malc0de.py malc0de.com Check if a netblock or IP is malicious according to malc0de.com.
sfp_malwaredomainlist.py malwaredomainlist.com Check if a host/domain, IP or netblock is malicious according to malwaredomainlist.com.
sfp_malwaredomains.py malwaredomains.com Check if a host/domain is malicious according to malwaredomains.com.
sfp_malwarepatrol.py MalwarePatrol Searches malwarepatrol.net’s database of malicious URLs/IPs.
sfp_mcafee.py McAfee SiteAdvisor Check if a host/domain is malicious according to McAfee SiteAdvisor.
sfp_multiproxy.py multiproxy.org Open Proxies Check if an IP is an open proxy according to multiproxy.org’ open proxy list.
sfp_names.py Name Extractor Attempt to identify human names in fetched content.
sfp_nothink.py Nothink.org Check if a host/domain, netblock or IP is malicious according to Nothink.org.
sfp_onioncity.py Onion.city Search Tor ‘Onion City’ search engine for mentions of the target domain.
sfp_openbugbounty.py Open Bug Bounty Check external vulnerability scanning/reporting service openbugbounty.org to see if the target is listed.
sfp_pageinfo.py Page Info Obtain information about web pages (do they take passwords, do they contain forms, etc.)
sfp_pastebin.py PasteBin PasteBin scraping (via Google) to identify related content.
sfp_pastie.py Pastie.org Pastie.org scraping (via Google) to identify related content.
sfp_pgp.py PGP Key Look-up Look up e-mail addresses in PGP public key servers.
sfp_phishtank.py PhishTank Check if a host/domain is malicious according to PhishTank.
sfp_phone.py Phone Numbers Identify phone numbers in scraped webpages.
sfp_portscan_tcp.py Port Scanner - TCP Scans for commonly open TCP ports on Internet-facing systems.
sfp_psbdmp.py Psbdmp.com Check psbdmp.com (PasteBin Dump) for potentially hacked e-mails and domains.
sfp_pwned.py Pwned Password Check Have I Been Pwned? for hacked e-mail addresses identified.
sfp_ripe.py RIPE Internet Registry Queries the RIPE registry (includes ARIN data) to identify netblocks and other info.
sfp_robtex.py Robtex Search Robtex.com for hosts sharing the same IP.
sfp_s3bucket.py S3 Bucket Finder Search for potential S3 buckets associated with the target.
sfp_shodan.py SHODAN Obtain information from SHODAN about identified IP addresses.
sfp_similar.py Similar Domains Search various sources to identify similar looking domain names, for instance squatted domains.
sfp_social.py Social Networks Identify presence on social media networks such as LinkedIn, Twitter and others.
sfp_socialprofiles.py Social Media Profiles Identify the social media profiles for human names identified.
sfp_sorbs.py SORBS Query the SORBS database for open relays, open proxies, vulnerable servers, etc.
sfp_spamcop.py SpamCop Query various spamcop databases for open relays, open proxies, vulnerable servers, etc.
sfp_spamhaus.py Spamhaus Query the Spamhaus databases for open relays, open proxies, vulnerable servers, etc.
sfp_spider.py Spider Spidering of web-pages to extract content for searching.
sfp_sslcert.py SSL Gather information about SSL certificates used by the target’s HTTPS sites.
sfp_strangeheaders.py Strange Headers Obtain non-standard HTTP headers returned by web servers.
sfp_threatcrowd.py ThreatCrowd Obtain information from ThreatCrowd about identified IP addresses, domains and e-mail addresses.
sfp_threatexpert.py ThreatExpert.com Check if a host/domain or IP is malicious according to ThreatExpert.com.
sfp_tldsearch.py TLD Search Search all Internet TLDs for domains with the same name as the target (this can be very slow.)
sfp_torch.py TORCH Search Tor ‘TORCH’ search engine for mentions of the target domain.
sfp_torexits.py TOR Exit Nodes Check if an IP or netblock appears on the torproject.org exit node list.
sfp_torserver.py TOR Servers Check if an IP or netblock appears on the blutmagie.de TOR server list.
sfp_totalhash.py TotalHash.com Check if a host/domain or IP is malicious according to TotalHash.com.
sfp_uceprotect.py UCEPROTECT Query the UCEPROTECT databases for open relays, open proxies, vulnerable servers, etc.
sfp_virustotal.py VirusTotal Obtain information from VirusTotal about identified IP addresses.
sfp_voipbl.py VoIPBL OpenPBX IPs Check if an IP or netblock is an open PBX according to VoIPBL OpenPBX IPs.
sfp_vxvault.py VXVault.net Check if a domain or IP is malicious according to VXVault.net.
sfp_watchguard.py Watchguard Check if an IP is malicious according to Watchguard’s reputationauthority.org.
sfp_webframework.py Web Framework Identify the usage of popular web frameworks like jQuery, YUI and others.
sfp_websvr.py Web Server Obtain web server banners to identify versions of web servers being used.
sfp_whois.py Whois Perform a WHOIS look-up on domain names and owned netblocks.
sfp_wikileaks.py Wikileaks Search Wikileaks for mentions of domain names and e-mail addresses.
sfp_xforce.py XForce Exchange Obtain information from IBM X-Force Exchange
sfp_yahoosearch.py Yahoo Some light Yahoo scraping to identify sub-domains and links.
sfp_zoneh.py Zone-H Defacement Check Check if a hostname/domain appears on the zone-h.org ‘special defacements’ RSS feed.

Data Elements

As mentioned above, SpiderFoot works on an “event-driven” module, whereby each module generates events about data elements which other modules listen to and consume.

The data elements are one of the following types:

  • entities like IP addresses, Internet names (hostnames, sub-domains, domains),
  • sub-entities like port numbers, URLs and software installed,
  • descriptors of those entities (malicious, physical location information, …) or
  • data which is mostly unstructured data (web page content, port banners, raw DNS records, …)

Here are all the available data elements built into SpiderFoot:

Element ID Element Name Element Data Type
ACCOUNT_EXTERNAL_OWNED Account on External Site ENTITY
ACCOUNT_EXTERNAL_OWNED_COMPROMISED Hacked Account on External Site DESCRIPTOR
ACCOUNT_EXTERNAL_USER_SHARED User Account on External Site ENTITY
ACCOUNT_EXTERNAL_USER_SHARED_COMPROMISED Hacked User Account on External Site DESCRIPTOR
AFFILIATE_INTERNET_NAME Affiliate - Internet Name ENTITY
AFFILIATE_IPADDR Affiliate - IP Address ENTITY
AFFILIATE_WEB_CONTENT Affiliate - Web Content DATA
AFFILIATE_DESCRIPTION_CATEGORY Affiliate Description - Category DESCRIPTOR
AFFILIATE_DESCRIPTION_ABSTRACT Affiliate Description - Abstract DESCRIPTOR
APPSTORE_ENTRY App Store Entry ENTITY
AMAZON_S3_BUCKET Amazon S3 Bucket ENTITY
BASE64_DATA Base64-encoded Data DATA
BITCOIN_ADDRESS Bitcoin Address ENTITY
BITCOIN_BALANCE Bitcoin Balance DESCRIPTOR
BGP_AS_OWNER BGP AS Ownership ENTITY
BGP_AS_MEMBER BGP AS Membership ENTITY
BGP_AS_PEER BGP AS Peer ENTITY
BLACKLISTED_IPADDR Blacklisted IP Address DESCRIPTOR
BLACKLISTED_AFFILIATE_IPADDR Blacklisted Affiliate IP Address DESCRIPTOR
BLACKLISTED_SUBNET Blacklisted IP on Same Subnet DESCRIPTOR
BLACKLISTED_NETBLOCK Blacklisted IP on Owned Netblock DESCRIPTOR
CO_HOSTED_SITE Co-Hosted Site ENTITY
DARKNET_MENTION_URL Darknet Mention URL DESCRIPTOR
DARKNET_MENTION_CONTENT Darknet Mention Web Content DATA
DEFACED_INTERNET_NAME Defaced DESCRIPTOR
DEFACED_IPADDR Defaced IP Address DESCRIPTOR
DEFACED_AFFILIATE_INTERNET_NAME Defaced Affiliate DESCRIPTOR
DEFACED_COHOST Defaced Co-Hosted Site DESCRIPTOR
DEFACED_AFFILIATE_IPADDR Defaced Affiliate IP Address DESCRIPTOR
DESCRIPTION_CATEGORY Description - Category DESCRIPTOR
DESCRIPTION_ABSTRACT Description - Abstract DESCRIPTOR
DEVICE_TYPE Device Type DESCRIPTOR
DNS_TEXT DNS TXT Record DATA
DNS_SPF DNS SPF Record DATA
DOMAIN_NAME Domain Name ENTITY
DOMAIN_NAME_PARENT Domain Name (Parent) ENTITY
DOMAIN_REGISTRAR Domain Registrar ENTITY
DOMAIN_WHOIS Domain Whois DATA
EMAILADDR Email Address ENTITY
EMAILADDR_COMPROMISED Hacked Email Address DESCRIPTOR
ERROR_MESSAGE Error Message DATA
GEOINFO Physical Location DESCRIPTOR
HTTP_CODE HTTP Status Code DATA
HUMAN_NAME Human Name ENTITY
INTERESTING_FILE Interesting File DESCRIPTOR
INTERESTING_FILE_HISTORIC Historic Interesting File DESCRIPTOR
JUNK_FILE Junk File DESCRIPTOR
INTERNET_NAME Internet Name ENTITY
IP_ADDRESS IP Address ENTITY
IPV6_ADDRESS IPv6 Address ENTITY
LINKED_URL_INTERNAL Linked URL - Internal SUBENTITY
LINKED_URL_EXTERNAL Linked URL - External SUBENTITY
MALICIOUS_ASN Malicious AS DESCRIPTOR
MALICIOUS_IPADDR Malicious IP Address DESCRIPTOR
MALICIOUS_COHOST Malicious Co-Hosted Site DESCRIPTOR
MALICIOUS_EMAILADDR Malicious E-mail Address DESCRIPTOR
MALICIOUS_INTERNET_NAME Malicious Internet Name DESCRIPTOR
MALICIOUS_AFFILIATE_INTERNET_NAME Malicious Affiliate DESCRIPTOR
MALICIOUS_AFFILIATE_IPADDR Malicious Affiliate IP Address DESCRIPTOR
MALICIOUS_NETBLOCK Malicious IP on Owned Netblock DESCRIPTOR
MALICIOUS_SUBNET Malicious IP on Same Subnet DESCRIPTOR
NETBLOCK_OWNER Netblock Ownership ENTITY
NETBLOCK_MEMBER Netblock Membership ENTITY
NETBLOCK_WHOIS Netblock Whois DATA
OPERATING_SYSTEM Operating System DESCRIPTOR
LEAKSITE_URL Leak Site URL ENTITY
LEAKSITE_CONTENT Leak Site Content DATA
PHONE_NUMBER Phone Number ENTITY
PHYSICAL_ADDRESS Physical Address ENTITY
PHYSICAL_COORDINATES Physical Coordinates ENTITY
PGP_KEY PGP Public Key DATA
PROVIDER_DNS Name Server (DNS NS Records) ENTITY
PROVIDER_JAVASCRIPT Externally Hosted Javascript ENTITY
PROVIDER_MAIL Email Gateway (DNS MX Records) ENTITY
PROVIDER_HOSTING Hosting Provider ENTITY
PUBLIC_CODE_REPO Public Code Repository ENTITY
RAW_RIR_DATA Raw Data from RIRs DATA
RAW_DNS_RECORDS Raw DNS Records DATA
RAW_FILE_META_DATA Raw File Meta Data DATA
SEARCH_ENGINE_WEB_CONTENT Search Engines Web Content DATA
SOCIAL_MEDIA Social Media Presence ENTITY
SIMILARDOMAIN Similar Domain ENTITY
SOFTWARE_USED Software Used SUBENTITY
SSL_CERTIFICATE_RAW SSL Certificate - Raw Data DATA
SSL_CERTIFICATE_ISSUED SSL Certificate - Issued to ENTITY
SSL_CERTIFICATE_ISSUER SSL Certificate - Issued by ENTITY
SSL_CERTIFICATE_MISMATCH SSL Certificate Host Mismatch DESCRIPTOR
SSL_CERTIFICATE_EXPIRED SSL Certificate Expired DESCRIPTOR
SSL_CERTIFICATE_EXPIRING SSL Certificate Expiring DESCRIPTOR
TARGET_WEB_CONTENT Web Content DATA
TARGET_WEB_COOKIE Cookies DATA
TCP_PORT_OPEN Open TCP Port SUBENTITY
TCP_PORT_OPEN_BANNER Open TCP Port Banner DATA
UDP_PORT_OPEN Open UDP Port SUBENTITY
UDP_PORT_OPEN_INFO Open UDP Port Information DATA
URL_ADBLOCKED_EXTERNAL URL (AdBlocked External) DESCRIPTOR
URL_ADBLOCKED_INTERNAL URL (AdBlocked Internal) DESCRIPTOR
URL_FORM URL (Form) DESCRIPTOR
URL_FLASH URL (Uses Flash) DESCRIPTOR
URL_JAVASCRIPT URL (Uses Javascript) DESCRIPTOR
URL_WEB_FRAMEWORK URL (Uses a Web Framework) DESCRIPTOR
URL_JAVA_APPLET URL (Uses Java Applet) DESCRIPTOR
URL_STATIC URL (Purely Static) DESCRIPTOR
URL_PASSWORD URL (Accepts Passwords) DESCRIPTOR
URL_UPLOAD URL (Accepts Uploads) DESCRIPTOR
URL_FORM_HISTORIC Historic URL (Form) DESCRIPTOR
URL_FLASH_HISTORIC Historic URL (Uses Flash) DESCRIPTOR
URL_JAVASCRIPT_HISTORIC Historic URL (Uses Javascript) DESCRIPTOR
URL_WEB_FRAMEWORK_HISTORIC Historic URL (Uses a Web Framework) DESCRIPTOR
URL_JAVA_APPLET_HISTORIC Historic URL (Uses Java Applet) DESCRIPTOR
URL_STATIC_HISTORIC Historic URL (Purely Static) DESCRIPTOR
URL_PASSWORD_HISTORIC Historic URL (Accepts Passwords) DESCRIPTOR
URL_UPLOAD_HISTORIC Historic URL (Accepts Uploads) DESCRIPTOR
USERNAME Username ENTITY
VULNERABILITY Vulnerability in Public Domain DESCRIPTOR
WEBSERVER_BANNER Web Server DATA
WEBSERVER_HTTPHEADERS HTTP Headers DATA
WEBSERVER_STRANGEHEADER Non-Standard HTTP Header DATA
WEBSERVER_TECHNOLOGY Web Technology DESCRIPTOR

Writing a Module

To write a SpiderFoot module, start by looking at the sfp_template.py file which is a skeleton module that does nothing. Use the following steps as your guide:

  1. Create a copy of sfp_template.py to whatever your module will be named. Try and make this something descriptive, i.e. not something like sfp_mymodule.py but instead something like sfp_imageanalyser.py if you were creating a module to analyse image content.
  2. Replace XXX in the new module with the name of your module and update the descriptive information in the header and comment within the module.
  3. The comment for the class (check in sfp_template.py) is used by SpiderFoot in the UI to correctly categorise modules, so make it something meaningful. Look at other modules for examples.
  4. Set the events in watchedEvents() and producedEvents() accordingly, based on the data element table in the previous section. If you are producing a new data element not pre-existing in SpiderFoot, you must create this in the database:
    • ~/spiderfoot-X.X.X$ sqlite3 spiderfoot.db
      sqlite> INSERT INTO tbl_event_types (event, event_descr, event_raw) VALUES ('NEW_DATA_ELEMENT_TYPE_NAME_HERE', 'Description of your New Data Element Here', 0, 'DESCRIPTOR or DATA or ENTITY or SUBENTITY');`
      
  5. Put the logic for the module in handleEvent(). Each call to handleEvent() is provided a SpiderFootEvent object. The most important values within this object are:
    • eventType: The data element ID (IP_ADDRESS, WEBSERVER_BANNER, etc.)
    • data: The actual data, e.g. the IP address or web server banner, etc.
    • module: The name of the module that produced the event (sfp_dns, etc.)
  6. When it is time to generate your event, create an instance of SpiderFootEvent:
    • e = SpiderFootEvent("IP_ADDRESS", ipaddr, self.__name__, event)
    • Note: the event passed as the last variable is the event that your module received. This is what builds a relationship between data elements in the SpiderFoot database.
  7. Notify all modules that may be interested in the event:
    • self.notifyListeners(e)

Up to table of contents

Database

All SpiderFoot data is stored in a SQLite database (spiderfoot.db in your SpiderFoot installation folder) which can be used outside of SpiderFoot for analysis of your data.

The schema is quite simple and can be viewed in the GitHub repo.

The below queries might provide some further clues:

# Total number of scans in the SpiderFoot database
sqlite> select count(*) from tbl_scan_instance;
10
# Obtain the ID for a particular scan
sqlite> select guid from tbl_scan_instance where seed_target = 'binarypool.com';
b459e339523b8d06235bd06087ae6c6017aaf4ed68dccea0b65a1999a17e460a
# Number of results per data type
sqlite> select count(*), type from tbl_scan_results where scan_instance_id = 'b459e339523b8d06235bd06087ae6c6017aaf4ed68dccea0b65a1999a17e460a' group by type;
5|AFFILIATE_INTERNET_NAME
2|AFFILIATE_IPADDR
1|CO_HOSTED_SITE
1|DOMAIN_NAME
1|DOMAIN_REGISTRAR
1|DOMAIN_WHOIS
1|GEOINFO
28|HTTP_CODE
48|HUMAN_NAME
49|INTERNET_NAME
2|IP_ADDRESS
49|LINKED_URL_EXTERNAL
144|LINKED_URL_INTERNAL
2|PROVIDER_DNS
1|PROVIDER_MAIL
4|RAW_DNS_RECORDS
1|RAW_FILE_META_DATA
1|ROOT
14|SEARCH_ENGINE_WEB_CONTENT
1|SOFTWARE_USED
16|TARGET_WEB_CONTENT
2|TCP_PORT_OPEN
1|TCP_PORT_OPEN_BANNER
1|URL_FORM
10|URL_JAVASCRIPT
6|URL_STATIC
21|URL_WEB_FRAMEWORK
28|WEBSERVER_BANNER
28|WEBSERVER_HTTPHEADERS

Up to table of contents