cookie
We use cookies

This website use cookies to give you the best, most relevant experience. By continuing to use this website you’re accepting this. If you wish to find out more, click this link (or by clicking the Privacy link at the bottom of any page)

About

What is SpiderFoot?

SpiderFoot is a reconnaissance tool that automatically queries over 100 public data sources (OSINT) to gather intelligence on IP addresses, domain names, e-mail addresses, names and more. You simply specify the target you want to investigate, pick which modules to enable and then SpiderFoot will collect data to build up an understanding of all the entities and how they relate to each other.

What is OSINT?

OSINT (Open Source Intelligence) is data available in the public domain which might reveal interesting information about your target. This includes DNS, Whois, Web pages, passive DNS, spam blacklists, file meta data, threat intelligence lists as well as services like SHODAN, HaveIBeenPwned? and more. Click here to see the full list of data sources SpiderFoot utilises.

What can I do with SpiderFoot?

The data returned from a SpiderFoot scan will reveal a lot of information about your target, providing insight into possible data leaks, vulnerabilities or other sensitive information that can be leveraged during a penetration test, red team exercise or for threat intelligence. Try it out against your own network to see what you might have exposed!

Up to table of contents

SpiderFoot HX

SpiderFoot HX builds upon the open source version’s module base to offer enhanced functionality all aspects of SpiderFoot, including performance, usability, data visualisation, security and more.

Additional Capabilities

In addition to the data collection capabilities of the open source version, SpiderFoot HX takes things a step further with the following features:

  • No installation or setup needed at all. Once you register, everything is ready to go. No Python dependencies to install, no virtual machines to spin up or ensuring you have enough compute/memory/disk to run a large scan.
  • Investigations. Sometimes, you don’t want full automation of your scan and want to step through the data collection process step-by-step, module-by-module. Investigations provide you with a visual way to take full control of the scanning process.
  • Multi-target scanning. In cases where you have multiple entities (domais, e-mail addresses, etc.) related to the same target, you can supply them all as targets of the one scan. This enables SpiderFoot to better identify relationships and find relevant information.
  • Scans are faster. Thanks to the completely overhauled backend architecture of SpiderFoot HX, scans run up to 10x faster than the open source version. This means you get the data you need, faster.
  • OSINT monitoring. Run scans automatically on a daily, weekly or monthly basis at a time of your choice and have all changes between scans automatically tracked and alerted on.
  • Email notifications. Receive email notifications when SpiderFoot scans finish, or when scheduled scans identify changes between scan runs.
  • Slack integration. Prefer your notifications over Slack? No problem; input your Slack hook URL and you’ll see notifications in Slack for scan completions and/or change notifications from scheduled scans.
  • Import scan targets. When scanning many targets, it might be easier to load them in via CSV, or as exported from Hunchly.
  • More modules. SpiderFoot HX adds additional modules for UDP port scanning, identification of languages used in content and screenshotting of certain content like social media profiles, dark web sites and security-sensitive webpages such as those that accept credentials.
  • Correlations. During and after a scan completes, the SpiderFoot HX correlation engine is looking for certain conditions that should be immediately investigated. This includes anomalies but also open cloud storage buckets, hostnames only found from certain sources and more.
  • Reporting & Visualisations. Slice and dice your scan results by data type, data family, module, module category and data source. Look at each data point in-depth to see how it was discovered, its relationships and more.
  • Team collaboration. Got a team working on OSINT and threat intelligence? With SpiderFoot HX, you can have multiple users with role-based access control, collaborating on scans and investigations.
  • Annotations. Add notes to scan results and pull them out with the API for rich integrations with internal SIEM tools, investigative platforms and ticketing systems.
  • Security. Two-factor authentication (2FA), role-based access control and a fully locked down cloud infrastructure mean you don’t need to deal with the security of your OSINT platform and investigations.
  • Anonymous. SpiderFoot HX has TOR integration out of the box and provides no way for a scanned entity to know that it’s you doing the scanning.
  • Custom Scan Profiles. Got a particular combination of modules you like to use for your scans but don’t like having to define them each time? With SpiderFoot HX, you can define scan profiles and re-use them for future scans.
  • SpiderFoot HX API. The SpiderFoot HX API is a fully documented RESTful API that supports virtually all UI functions so you can orchestrate the platform and extract data programmatically.

Up to table of contents

Seeking Help

Aside from this document, you’ll be able to get help with SpiderFoot from a number of places:

Up to table of contents

Pre-Requisites

Using Docker

If you would like to side-step having to install anything to get SpiderFoot running on Linux, follow the instructions here to run SpiderFoot in a Docker container.

Linux/BSD/Solaris

SpiderFoot is written in Python (2.7, with support for 3.x in development), so to run on Linux/Solaris/FreeBSD/etc. you need Python 2.7 installed, in addition to the various module dependencies. To install the dependencies using PIP, run the following:

$ pip install -r requirements.txt

On some Linux distributions, you might get an error about M2Crypto, so you must install it using APT instead and re-try with pip:


$ apt-get install python-m2crypto
$ pip install -r requirements.txt

Windows

If you’re using SpiderFoot 2.12 for Windows, you’ll have a compiled executable (.EXE) file and so all dependencies are packaged with it. No third party tools/libraries need to be installed, not even Python.

After version 2.12 however, SpiderFoot no longer ships with a .EXE file for running on Windows due to the stale nature of py2exe and inability to build some dependencies properly anymore on Windows.

Fortunately, with Python for Windows you can follow the below instructions to get SpiderFoot dependencies installed on Windows easily:

  1. Install Python for Windows
  2. Install PIP by downloading this file and running it with Python simply by doing: python get-pip.py
  3. Run pip as you would have for Linux by doing: pip install -r requirements.txt
  4. (Optional if you want to run from the repository and not a packaged release) Install git

MacOS X

Installing on MacOS X is facilitated by using the Homebrew package manager to install Python 2.7, pip and then installing SpiderFoot dependencies as you would on Linux:

  1. First, make sure you have Homebrew installed. Try running brew and if that doesn’t work, install it.
  2. Install Python 2.7 with brew install python@2 and this will also install pip
  3. With pip you can now install the SpiderFoot dependencies as you would on Linux with pip install -r requirements.txt
  4. (Optional if you want to run from the repository and not a packaged release) Install git with brew install git

Up to table of contents

Installing

SpiderFoot can be installed using git (this is the recommended approach as you’ll always have the latest version by simply doing a git pull), or by downloading a tarball of a release. The approach is the same regardless of platform:

From git


$ git clone https://github.com/smicallef/spiderfoot.git
$ cd spiderfoot
~/spiderfoot$

As a package


$ wget https://github.com/smicallef/spiderfoot/archive/v2.12.0-final.tar.gz
$ tar zxvf v2.12.0-final.tar.gz
$ cd spiderfoot
~/spiderfoot$

A note about older versions and dependencies

If you’re using SpiderFoot 2.12 on Linux or an older cloned version from the Github repository, some pre-requisites need to be installed:


$ pip install -r requirements.txt

On some distros, instead of M2Crypto, you must install it using APT instead:


$ apt-get install python-m2crypto

Other modules such as PyPDF2, SOCKS and more are included in the 2.12 package, so you don’t need to install them separately.

Up to table of contents

Starting SpiderFoot

To run SpiderFoot, simply execute sf.py from the directory you extracted/pulled SpiderFoot into:

~/spiderfoot$ python sf.py

Once executed, a web-server will be started, which by default will listen on 127.0.0.1:5001. You can then use the web-browser of your choice by browsing to https://127.0.0.1:5001. Or, since version 2.10 you can use the CLI, which by default will connect to the server locally, on 127.0.0.1:5001:

~/spiderfoot$ python sfcli.py

If you wish to make SpiderFoot accessible from another system, for example running it on a server and controlling it remotely, then you can specify an external IP for SpiderFoot to bind to, or use 0.0.0.0 so that it binds to all addresses, including 127.0.0.1:

~/spiderfoot$ python sf.py 0.0.0.0:5001

Then to use the CLI in such a case, from a remote system where the sfcli.py file has been copied to, you would run:

$ python sfcli.py -u https://<remote ip>:5001

Run python ./sfcli.py --help to better understand how to use the CLI.

If port 5001 is used by another application on your system, you can change the port:

~/spiderfoot$ python sf.py 127.0.0.1:9999

Once started, you will see something similar to this, which means you are ready to go. If you instead see an error message about missing modules, please go back and ensure you’ve installed all the pre-requisites.

~/spiderfoot$ python ./sf.py 0.0.0.0:5001
Attempting to verify database and update if necessary...
Starting web server at https://0.0.0.0:5001 ...


*************************************************************
 Use SpiderFoot by starting your web browser of choice and
 browse to https://<IP of this host>:5001
*************************************************************


[08/Jul/2019:14:40:53] ENGINE Listening for SIGHUP.
[08/Jul/2019:14:40:53] ENGINE Listening for SIGTERM.
[08/Jul/2019:14:40:53] ENGINE Listening for SIGUSR1.
[08/Jul/2019:14:40:53] ENGINE Bus STARTING
[08/Jul/2019:14:40:53] ENGINE Serving on https://0.0.0.0:5001
[08/Jul/2019:14:40:53] ENGINE Bus STARTED

Caution!

By default, SpiderFoot does not authenticate users connecting to its user-interface or serve over HTTPS, so avoid running it on a server/workstation that can be accessed from untrusted devices, as they will be able to control SpiderFoot remotely and initiate scans from your devices. As of SpiderFoot 2.7, to use authentication and HTTPS, see the Security section below.

Up to table of contents

Security

With version 2.7, SpiderFoot introduced authentication as well as TLS/SSL support. These are automatic based on the presence of specific files.

Authentication

SpiderFoot will require basic digest authentication if a file named passwd exists in the SpiderFoot root directory. The format of the file is simple – just create an entry per account, in the format of:

username:password

For example:

admin:supersecretpassword

Once the file is created, restart SpiderFoot.

TLS/SSL

SpiderFoot will serve HTTPS (and only that) if it detects the existence of a public certificate and key file in SpiderFoot’s root directory. This means whatever port you set SpiderFoot to listen on is the port TLS/SSL will be used. It is not possible for SpiderFoot to serve both HTTP and HTTPS simultaneously on different ports. If you need to do that, an nginx proxy in front of SpiderFoot would be a better solution.

Simply place two files in the SpiderFoot directory – spiderfoot.crt (RSA public key in PEM format) and spiderfoot.key (RSA private key in PEM format). Restart SpiderFoot and you will now be serving HTTPS only.

For instructions on generating a self-signed certificate, check out this StackOverflow article.

Up to table of contents

API Keys

Many SpiderFoot modules require API keys to function to their fullest extent (or at all), so you will need to go to each service and obtain an API key where you feel that having such a key would add value to your scans. How to obtain those keys goes beyond the scope of this documentation, but generally the pattern looks like:

  1. Google the name of the service
  2. Go to their website
  3. Sign up
  4. Under your account settings or similar, there’s an API key
  5. Enter the API key into the SpiderFoot UI under the Settings section for the respective module (and ensure you have enough credits with that service)

The below instructions are for historical purposes only and are no longer maintained:

Honeypot Checker

  1. Go to https://www.projecthoneypot.org
  2. Sign up (free) and log in
  3. Click Services -> HTTP Blacklist
  4. An API key should be listed
  5. Copy and paste that key into the Settings -> Honeypot Checker section in SpiderFoot

SHODAN

  1. Go to https://www.shodanhq.com
  2. Sign up (free) and log in
  3. Click ‘Developer Center’
  4. On the far right your API key should appear in a box
  5. Copy and paste that key into the Settings -> SHODAN section in SpiderFoot

VirusTotal

  1. Go to https://www.virustotal.com
  2. Sign up (free) and log in
  3. Click your username in the far right and select ‘My API Key’
  4. Copy and paste the key in the grey box into the Settings -> VirusTotal section in SpiderFoot

IBM X-Force Exchange

  1. Go to https://exchange.xforce.ibmcloud.com/new
  2. Create an IBM ID (free) and log in
  3. Go to your account settings
  4. Click API Access
  5. Generate the API key and password (you need both)
  6. Copy and paste the key and password into the Settings -> X-Force section in SpiderFoot

MalwarePatrol

  1. Go to https://www.malwarepatrol.net
  2. Create an account (free) and log in
  3. Click “Open Source” and scroll down to the bottom
  4. Click the “Free” link in the subscription pricing table
  5. Click the free block lists link
  6. You will receive a receipt ID
  7. Copy and paste the receipt ID into the Settings -> MalwarePatrol section in SpiderFoot

BotScout

  1. Go to https://www.botscout.com
  2. Create an account (free) and log in
  3. Under Account Info, your API key will be there
  4. Copy and paste the API key into the Settings -> BotScout section in SpiderFoot

Cymon.io

  1. Go to https://www.cymon.io
  2. Create an account (free) and log in
  3. Under “My API Dashboard”, your API key will be there
  4. Copy and paste the API key into the Settings -> Cymon section in SpiderFoot

Censys.io

  1. Go to https://www.censys.io
  2. Create an account (free) and log in
  3. Click “My Account” (bottom right)
  4. Copy and paste the API Credentials values into the Settings -> Censys section in SpiderFoot

Hunter.io

  1. Go to https://www.hunter.io
  2. Create an account (free) and log in
  3. Click “API” in the top menu-base
  4. Copy and paste the API key into the Settings -> Hunter.io section in SpiderFoot

AlienVault OTX

  1. Go to https://otx.alienvault.com/ and sign up
  2. Log in and click your account on the top right, go to Settings
  3. Scroll down and copy and paste the OTX Key value into the Settings -> AlienVault OTX section in SpiderFoot

Clearbit

  1. Go to https://dashboard.clearbit.com/login and sign up
  2. Log in and click the API link on the left
  3. Copy and paste the “secret” API key into the Settings -> Clearbit section in SpiderFoot

BuiltWith

  1. Go to https://www.builtwith.com and sign up. You get 50 queries for free before having to pay (it’s totally worth it though)
  2. Log in and click on the “Domain API” tab. No other API key type will work with SpiderFoot!
  3. Your API key will appear on the right
  4. Copy and paste it into the Settings -> BuiltWith section in SpiderFoot

FraudGuard

  1. Go to https://fraudguard.io
  2. Register with the plan you choose. The free plan is also available
  3. Click to ‘Create’ an API key, in the form of a username and password
  4. Copy and paste both into the Settings -> Fraudguard section in SpiderFoot

IPinfo.io

  1. Go to https://ipinfo.io
  2. Click on Pricing and select the plan you choose. They offer a very generous free plan with 1,000 queries per day
  3. Click Subscribe, enter your details and follow the registration process
  4. Copy and paste the ‘Access token’ in your Profile to the Settings -> ipinfo.io section in SpiderFoot

CIRCL.LU

  1. Contact CIRCL.LU and ask for Passive DNS and Passive SSL. They are very responsive and will provide you credentials
  2. Enter the credentials into the Settings -> CIRCL.LU section in SpiderFoot

SeccurityTrails

  1. Go to the SecurityTrails pricing page
  2. Select the plan you want and click Sign-up, complete the sign-up process
  3. Enter the provided API key into the Settings -> SecurityTrails section in SpiderFoot

FullContact.com

  1. Go to https://fullcontact.com and follow the sign-up process
  2. Log in to the dashboard and create an API key
  3. Copy and paste the API key into the Settings -> FullContact.com section in SpiderFoot

RiskIQ

  1. Go to https://riskiq.com and click the “Sign up for the Free Edition” link up top
  2. Click Register for the Free Edition
  3. Fill out your details and complete the registration process
  4. Log in
  5. Click your account icon in the top right and go to Account Settings
  6. Go to the “API Access” section and click the “Show” link next to User
  7. Copy the key and secret into the Settings -> RiskIQ section in SpiderFoot

Citadel.pw

A free API key has been provided and will be used if you do not have your own. To obtain your own key, you will need to follow the instructions on the citadel.pwwebsite.

Up to table of contents

Configuring SpiderFoot

One of the main principles behind SpiderFoot is that it’s highly configurable. Every setting is available in the user interface within the Settings section and should be adequately explained there. Just a few key points to note:

  • API keys can be imported and exported between SpiderFoot and SpiderFoot HX using the “Import API Keys” and “Export API Keys” functions. The format is also a simple CSV so can also be manipulated outside of SpiderFoot to be loaded in, if you prefer.
  • When Debugging is enabled, a lot of logs are generated and can sometimes result in error messages about database locking. This appears to be harmless towards the scan but can mean that logs get dropped.
  • It is worth going through the modules you intend to rely upon heavily to ensure they are configured appropriately for your needs, most importantly the DNS-related modules as they tend to have a knock-on impact to many other modules.

Up to table of contents

Using SpiderFoot

Running a Scan

When you run SpiderFoot for the first time, there is no historical data, so you should be presented with a screen like the following:

To initiate a scan, click on the ‘New Scan’ button in the top menu bar. You will then need to define a name for your scan (these are non-unique) and a target (also non-unique):

You can then define how you would like to run the scan – either by use case (the tab selected by default), by data required or by module.

Module-based scanning is for more advanced users who are familiar with the behavior and data provided by different modules, and want more control over the scan:

Beware though, there is no dependency checking when scanning by module, only for scanning by required data. This means that if you select a module that depends on event types only provided by other modules, but those modules are not selected, you will get no results.

Scan Results

From the moment you click ‘Run Scan’, you will be taken to a screen for monitoring your scan in near real time:

That screen is made up of a graph showing a break down of the data obtained so far plus log messages generated by SpiderFoot and its modules.

The bars of the graph are clickable, taking you to the result table for that particular data type.

Browsing Results

By clicking on the ‘Browse’ button for a scan, you can browse the data by type:

This data is exportable and searchable. Click the Search box to get a pop-up explaining how to perform searches.

By clicking on one of the data types, you will be presented with the actual data:

The fields displayed are explained as follows:

  • Checkbox field: Use this to set/unset fields as false positive. Once at least one is checked, click the orange False Positive button above to set/unset the record.
  • Data Element: The data the module was able to obtain about your target.
  • Source Data Element: The data the module received as the basis for its data colletion. In the example above, the sfp_portscan_tcp module received an event about an open port, and used that to obtain the banner on that port.
  • Source Module: The module that identified this data.
  • Identified: When the data was identified by the module.

You can click the black icons to modify how this data is represented. For instance you can get a unique data representation by clicking the Unique Data View icon:

Setting False Positives

Version 2.6.0 introduced the ability to set data records as false positive. As indicated in the previous section, use the checkbox and the orange button to set/unset records as false positive:

Once you have set records as false positive, you will see an indicator next to those records, and have the ability to filter them from view, as shown below:

NOTE: Records can only be set to false positive once a scan has finished running. This is because setting a record to false positive also results in all child data elements being set to false positive. This obviously cannot be done if the scan is still running and can thus lead to an inconsistent state in the back-end. The UI will prevent you from doing so.

The result of a record being set to false positive, aside from the indicator in the data table view and exports, is that such data will not be shown in the node graphs.

Searching Results

Results can be searched either at the whole scan level, or within individual data types. The scope of the search is determined by the screen you are on at the time.

As indicated by the pop-up box when selecting the search field, you can search as follows:

  • Exact value: Non-wildcard searching for a specific value. For example, search for 404 within the HTTP Status Code section to see all pages that were not found.
  • Pattern matching: Search for simple wildcards to find patterns. For example, search for *:22 within the Open TCP Port section to see all instances of port 22 open.
  • Regular expression searches: Encapsulate your string in ‘/’ to search by regular expression. For example, search for ‘/\d+.\d+.\d+.\d+/’ to find anything looking like an IP address in your scan results.

Managing Scans

When you have some historical scan data accumulated, you can use the list available on the ‘Scans’ section to manage them:

You can filter the scans shown by altering the Filter drop-down selection. Except for the green refresh icon, all icons on the right will all apply to whichever scans you have checked the checkboxes for.

Tor Integration

Refer to this post for more information.

Up to table of contents

Modules

Overview

SpiderFoot has all data collection modularised. When a module discovers a piece of data, that data is transmitted to all other modules that are ‘interested’ in that data type for processing. Those modules will then act on that piece of data to identify new data, and in turn generate new events for other modules which may be interested, and so on.

For example, sfp_dnsresolve may identify an IP address associated with your target, notifying all interested modules. One of those interested modules would be the sfp_ripe module, which will take that IP address and identify the netblock it is a part of, the BGP ASN and so on.

This might be best illustrated by looking at module code. For example, the sfp_names module looks for TARGET_WEB_CONTENT and EMAILADDR events for identifying human names:

    # What events is this module interested in for input
    # * = be notified about all events.
    def watchedEvents(self):
        return ["TARGET_WEB_CONTENT", "EMAILADDR"]

    # What events this module produces
    # This is to support the end user in selecting modules based on events
    # produced.
    def producedEvents(self):
        return ["HUMAN_NAME"]

Meanwhile, as each event is generated to a module, it is also recorded in the SpiderFoot database for reporting and viewing in the UI.

Module List

The below table is an up-to-date list of all SpiderFoot modules and a short summary of their capabilities.

ModuleNameDescription
sfp_abusech.pyabuse.chCheck if a host/domain, IP or netblock is malicious according to abuse.ch.
sfp_abuseipdb.pyAbuseIPDBCheck if a netblock or IP is malicious according to AbuseIPDB.com.
sfp_accounts.pyAccountsLook for possible associated accounts on nearly 200 websites like Ebay, Slashdot, reddit, etc.
sfp_adblock.pyAdBlock CheckCheck if linked pages would be blocked by AdBlock Plus.
sfp_ahmia.pyAhmiaSearch Tor ‘Ahmia’ search engine for mentions of the target domain.
sfp_alienvault.pyAlienVault OTXObtain information from AlienVault Open Threat Exchange (OTX)
sfp_alienvaultiprep.pyAlienVault IP ReputationCheck if an IP or netblock is malicious according to the AlienVault IP Reputation database.
sfp_archiveorg.pyArchive.orgIdentifies historic versions of interesting files/pages from the Wayback Machine.
sfp_arin.pyARINQueries ARIN registry for contact information.
sfp_azureblobstorage.pyAzure Blob FinderSearch for potential Azure blobs associated with the target and attempt to list their contents.
sfp_badipscom.pybadips.comCheck if a domain or IP is malicious according to badips.com.
sfp_bambenek.pyBambenek C&C ListCheck if a host/domain or IP appears on Bambenek Consulting’s C&C tracker lists.
sfp_base64.pyBase64Identify Base64-encoded strings in any content and URLs, often revealing interesting hidden information.
sfp_binaryedge.pyBinaryEdgeObtain information from BinaryEdge.io’s Internet scanning systems about breaches, vulerabilities, torrents and passive DNS.
sfp_bingsearch.pyBingSome light Bing scraping to identify sub-domains and links.
sfp_bingsharedip.pyBing (Shared IPs)Search Bing for hosts sharing the same IP.
sfp_binstring.pyBinary String ExtractorAttempt to identify strings in binary content.
sfp_bitcoin.pyBitcoin FinderIdentify bitcoin addresses in scraped webpages.
sfp_blockchain.pyBlockchainQueries blockchain.info to find the balance of identified bitcoin wallet addresses.
sfp_blocklistde.pyblocklist.deCheck if a netblock or IP is malicious according to blocklist.de.
sfp_botscout.pyBotScoutSearches botscout.com’s database of spam-bot IPs and e-mail addresses.
sfp_builtwith.pyBuiltWithQuery BuiltWith.com’s Domain API for information about your target’s web technology stack, e-mail addresses and more.
sfp_callername.pyCallerNameLookup US phone number location and reputation information.
sfp_censys.pyCensysObtain information from Censys.io
sfp_cinsscore.pyCINS Army ListCheck if a netblock or IP is malicious according to cinsscore.com’s Army List.
sfp_circllu.pyCIRCL.LUObtain information from CIRCL.LU’s Passive DNS and Passive SSL databases.
sfp_citadel.pyCitadel EngineSearches Leak-Lookup.com’s database of breaches.
sfp_cleanbrowsing.pyCleanbrowsing.orgCheck if a host would be blocked by Cleanbrowsing.org DNS
sfp_cleantalk.pyCleanTalk Spam ListCheck if an IP is on CleanTalk.org’s spam IP list.
sfp_clearbit.pyClearbitCheck for names, addresses, domains and more based on lookups of e-mail addresses on clearbit.com.
sfp_coinblocker.pyCoinBlocker ListsCheck if a host/domain or IP appears on CoinBlocker lists.
sfp_commoncrawl.pyCommonCrawlSearches for URLs found through CommonCrawl.org.
sfp_comodo.pyComodoCheck if a host would be blocked by Comodo DNS
sfp_company.pyCompany NamesIdentify company names in any obtained data.
sfp_cookie.pyCookiesExtract Cookies from HTTP headers.
sfp_crossref.pyCross-ReferenceIdentify whether other domains are associated (‘Affiliates’) of the target.
sfp_crt.pyCertificate TransparencyGather hostnames from historical certificates in crt.sh.
sfp_cryptoioc.pyCryptoIOC.chCheck if an IP is participating in malicious cryptocurrency mining.
sfp_customfeed.pyCustom Threat FeedCheck if a host/domain, netblock, ASN or IP is malicious according to your custom feed.
sfp_cybercrimetracker.pycybercrime-tracker.netCheck if a host/domain or IP is malicious according to cybercrime-tracker.net.
sfp_darksearch.pyDarksearchSearch the Darksearch.io Tor search engine for mentions of the target domain.
sfp_digitaloceanspace.pyDigital Ocean Space FinderSearch for potential Digital Ocean Spaces associated with the target and attempt to list their contents.
sfp_dnsbrute.pyDNS Brute-forceAttempts to identify hostnames through brute-forcing common names and iterations.
sfp_dnscommonsrv.pyDNS Common SRVAttempts to identify hostnames through common SRV.
sfp_dnsneighbor.pyDNS Look-asideAttempt to reverse-resolve the IP addresses next to your target to see if they are related.
sfp_dnsraw.pyDNS Raw RecordsRetrieves raw DNS records such as MX, TXT and others.
sfp_dnsresolve.pyDNS ResolverResolves Hosts and IP Addresses identified, also extracted from raw content.
sfp_dnszonexfer.pyDNS Zone TransferAttempts to perform a full DNS zone transfer.
sfp_dronebl.pyDroneBLQuery the DroneBL database for open relays, open proxies, vulnerable servers, etc.
sfp_duckduckgo.pyDuckDuckGoQuery DuckDuckGo’s API for descriptive information about your target.
sfp_email.pyE-MailIdentify e-mail addresses in any obtained data.
sfp_emailformat.pyEmailFormatLook up e-mail addresses on email-format.com.
sfp_errors.pyErrorsIdentify common error messages in content like SQL errors, etc.
sfp_ethereum.pyEthereum FinderIdentify ethereum addresses in scraped webpages.
sfp_filemeta.pyFile MetadataExtracts meta data from documents and images.
sfp_flickr.pyFlickrLook up e-mail addresses on Flickr.
sfp_fortinet.pyFortiguard.comCheck if an IP is malicious according to Fortiguard.com.
sfp_fraudguard.pyFraudguardObtain threat information from Fraudguard.io
sfp_fullcontact.pyFullContactGather domain and e-mail information from fullcontact.com.
sfp_github.pyGithubIdentify associated public code repositories on Github.
sfp_googlemaps.pyGoogle MapsIdentifies potential physical addresses and latitude/longitude coordinates.
sfp_googlesearch.pyGoogleSome light Google scraping to identify sub-domains and links.
sfp_googlesearchdomain.pyGoogle Search, by domainSome light Google scraping to identify sub-domains and links within site
sfp_gravatar.pyGravatarRetrieve user information from Gravatar API.
sfp_greynoise.pyGreynoiseObtain information from Greynoise.io’s Enterprise API.
sfp_h1nobbdde.pyHackerOne (Unofficial)Check external vulnerability scanning/reporting service h1.nobbd.de to see if the target is listed.
sfp_hackertarget.pyHackerTarget.comSearch HackerTarget.com for hosts sharing the same IP.
sfp_haveibeenpwned.pyHaveIBeenPwnedCheck Have I Been Pwned? for hacked e-mail addresses identified.
sfp_honeypot.pyHoneypot CheckerQuery the projecthoneypot.org database for entries.
sfp_hosting.pyHosting ProvidersFind out if any IP addresses identified fall within known 3rd party hosting ranges, e.g. Amazon, Azure, etc.
sfp_hostsfilenet.pyhosts-file.net Malicious HostsCheck if a host/domain is malicious according to hosts-file.net Malicious Hosts.
sfp_hunter.pyHunter.ioCheck for e-mail addresses and names on hunter.io.
sfp_iknowwhatyoudownload.pyIknowwhatyoudownload.comCheck iknowwhatyoudownload.com for IP addresses that have been using BitTorrent.
sfp_intelx.pyIntelligenceXObtain information from IntelligenceX about identified IP addresses, domains, e-mail addresses and phone numbers.
sfp_intfiles.pyInteresting FilesIdentifies potential files of interest, e.g. office documents, zip files.
sfp_ipinfo.pyIPInfo.ioIdentifies the physical location of IP addresses identified using ipinfo.io.
sfp_ipstack.pyipstackIdentifies the physical location of IP addresses identified using ipstack.com.
sfp_isc.pyInternet Storm CenterCheck if an IP is malicious according to SANS ISC.
sfp_junkfiles.pyJunk FilesLooks for old/temporary and other similar files.
sfp_malc0de.pymalc0de.comCheck if a netblock or IP is malicious according to malc0de.com.
sfp_malwaredomainlist.pymalwaredomainlist.comCheck if a host/domain, IP or netblock is malicious according to malwaredomainlist.com.
sfp_malwaredomains.pymalwaredomains.comCheck if a host/domain is malicious according to malwaredomains.com.
sfp_malwarepatrol.pyMalwarePatrolSearches malwarepatrol.net’s database of malicious URLs/IPs.
sfp_mnemonic.pyMnemonic PassiveDNSObtain Passive DNS information from PassiveDNS.mnemonic.no.
sfp_multiproxy.pymultiproxy.org Open ProxiesCheck if an IP is an open proxy according to multiproxy.org’ open proxy list.
sfp_myspace.pyMySpaceGather username and location from MySpace.com profiles.
sfp_names.pyName ExtractorAttempt to identify human names in fetched content.
sfp_neutrinoapi.pyNeutrinoAPISearch NeutrinoAPI for IP address info and check IP reputation.
sfp_norton.pyNorton ConnectSafeCheck if a host would be blocked by Norton ConnectSafe DNS
sfp_nothink.pyNothink.orgCheck if a host/domain, netblock or IP is malicious according to Nothink.org.
sfp_numinfo.pynuminfoLookup phone number information from numinfo.net.
sfp_numpi.pynumpiLookup USA/Canada phone number location and carrier information from numpi.com.
sfp_numverify.pynumverifyLookup phone number location and carrier information from numverify.com.
sfp_onioncity.pyOnion.linkSearch Tor ‘Onion City’ search engine for mentions of the target domain.
sfp_onionsearchengine.pyOnionsearchengine.comSearch Tor onionsearchengine.com for mentions of the target domain.
sfp_openbugbounty.pyOpen Bug BountyCheck external vulnerability scanning/reporting service openbugbounty.org to see if the target is listed.
sfp_opencorporates.pyOpenCorporatesLook up company information from OpenCorporates.
sfp_opendns.pyOpenDNSCheck if a host would be blocked by OpenDNS DNS
sfp_openphish.pyOpenPhishCheck if a host/domain is malicious according to OpenPhish.com.
sfp_openstreetmap.pyOpenStreetMapRetrieves latitude/longitude coordinates for physical addresses from OpenStreetMap API.
sfp_pageinfo.pyPage InfoObtain information about web pages (do they take passwords, do they contain forms, etc.)
sfp_pastebin.pyPasteBinPasteBin scraping (via Google) to identify related content.
sfp_peegeepee.pyPeeGeePeeLook up e-mail addresses and domains on PeeGeePee.com.
sfp_pgp.pyPGP Key Look-upLook up e-mail addresses in PGP public key servers.
sfp_phishtank.pyPhishTankCheck if a host/domain is malicious according to PhishTank.
sfp_phone.pyPhone NumbersIdentify phone numbers in scraped webpages.
sfp_portscan_tcp.pyPort Scanner – TCPScans for commonly open TCP ports on Internet-facing systems.
sfp_psbdmp.pyPsbdmp.comCheck psbdmp.cc (PasteBin Dump) for potentially hacked e-mails and domains.
sfp_pulsedive.pyPulsediveObtain information from Pulsedive’s API.
sfp_quad9.pyQuad9Check if a host would be blocked by Quad9
sfp_ripe.pyRIPEQueries the RIPE registry (includes ARIN data) to identify netblocks and other info.
sfp_riskiq.pyRiskIQObtain information from RiskIQ’s (formerly PassiveTotal) Passive DNS and Passive SSL databases.
sfp_robtex.pyRobtexSearch Robtex.com for hosts sharing the same IP.
sfp_s3bucket.pyAmazon S3 Bucket FinderSearch for potential Amazon S3 buckets associated with the target and attempt to list their contents.
sfp_securitytrails.pySecurityTrailsObtain Passive DNS and other information from SecurityTrails
sfp_shodan.pySHODANObtain information from SHODAN about identified IP addresses.
sfp_similar.pySimilar DomainsSearch various sources to identify similar looking domain names, for instance squatted domains.
sfp_skymem.pySkymemLook up e-mail addresses on Skymem.
sfp_slideshare.pySlideShareGather name and location from SlideShare profiles.
sfp_social.pySocial NetworksIdentify presence on social media networks such as LinkedIn, Twitter and others.
sfp_socialprofiles.pySocial Media ProfilesTries to discover the social media profiles for human names identified.
sfp_sorbs.pySORBSQuery the SORBS database for open relays, open proxies, vulnerable servers, etc.
sfp_spamcop.pySpamCopQuery various spamcop databases for open relays, open proxies, vulnerable servers, etc.
sfp_spamhaus.pySpamhausQuery the Spamhaus databases for open relays, open proxies, vulnerable servers, etc.
sfp_spider.pySpiderSpidering of web-pages to extract content for searching.
sfp_spyonweb.pySpyOnWebSearch SpyOnWeb for hosts sharing the same IP address, Google Analytics code, or Google Adsense code.
sfp_sslcert.pySSL CertificatesGather information about SSL certificates used by the target’s HTTPS sites.
sfp_ssltools.pySSL ToolsGather information about SSL certificates from SSLTools.com.
sfp_strangeheaders.pyStrange HeadersObtain non-standard HTTP headers returned by web servers.
sfp_sublist3r.pySublist3rObtain information from Sublist3r’s database of hostnames.
sfp_talosintel.pyTalos IntelligenceCheck if a netblock or IP is malicious according to talosintelligence.com.
sfp_threatcrowd.pyThreatCrowdObtain information from ThreatCrowd about identified IP addresses, domains and e-mail addresses.
sfp_threatexpert.pyThreatExpert.comCheck if a host/domain or IP is malicious according to ThreatExpert.com.
sfp_threatminer.pyThreatMinerObtain information from ThreatMiner’s database for passive DNS and threat intelligence.
sfp_tldsearch.pyTLD SearchSearch all Internet TLDs for domains with the same name as the target (this can be very slow.)
sfp_tool_cmseek.pyTool – CMSeeKIdentify what Content Management System (CMS) might be used.
sfp_tool_dnstwist.pyTool – DNSTwistIdentify bit-squatting, typo and other similar domains to the target using a local DNSTwist installation.
sfp_torch.pyTORCHSearch Tor ‘TORCH’ search engine for mentions of the target domain.
sfp_torexits.pyTOR Exit NodesCheck if an IP or netblock appears on the torproject.org exit node list.
sfp_torserver.pyTOR ServersCheck if an IP or netblock appears on the blutmagie.de TOR server list.
sfp_totalhash.pyTotalHash.comCheck if a host/domain or IP is malicious according to TotalHash.com.
sfp_twitter.pyTwitterGather name and location from Twitter profiles.
sfp_uceprotect.pyUCEPROTECTQuery the UCEPROTECT databases for open relays, open proxies, vulnerable servers, etc.
sfp_viewdns.pyViewDNS.infoReverse Whois lookups using ViewDNS.info.
sfp_virustotal.pyVirusTotalObtain information from VirusTotal about identified IP addresses.
sfp_voipbl.pyVoIPBL OpenPBX IPsCheck if an IP or netblock is an open PBX according to VoIPBL OpenPBX IPs.
sfp_vxvault.pyVXVault.netCheck if a domain or IP is malicious according to VXVault.net.
sfp_watchguard.pyWatchguardCheck if an IP is malicious according to Watchguard’s reputationauthority.org.
sfp_webanalytics.pyWeb AnalyticsIdentify web analytics IDs in scraped webpages.
sfp_webframework.pyWeb FrameworkIdentify the usage of popular web frameworks like jQuery, YUI and others.
sfp_webserver.pyWeb ServerObtain web server banners to identify versions of web servers being used.
sfp_whatcms.pyWhatCMSCheck web technology using WhatCMS.org API.
sfp_whois.pyWhoisPerform a WHOIS look-up on domain names and owned netblocks.
sfp_whoisology.pyWhoisologyReverse Whois lookups using Whoisology.com.
sfp_whoxy.pyWhoxyReverse Whois lookups using Whoxy.com.
sfp_wigle.pyWigle.netQuery wigle.net to identify nearby WiFi access points.
sfp_wikileaks.pyWikileaksSearch Wikileaks for mentions of domain names and e-mail addresses.
sfp_wikipediaedits.pyWikipedia EditsIdentify edits to Wikipedia articles made from a given IP address or username.
sfp_xforce.pyXForce ExchangeObtain information from IBM X-Force Exchange
sfp_yandexdns.pyYandex DNSCheck if a host would be blocked by Yandex DNS
sfp_zoneh.pyZone-H Defacement CheckCheck if a hostname/domain appears on the zone-h.org ‘special defacements’ RSS feed.

Data Elements

As mentioned above, SpiderFoot works on an “event-driven” module, whereby each module generates events about data elements which other modules listen to and consume.

The data elements are one of the following types:

  • entities like IP addresses, Internet names (hostnames, sub-domains, domains),
  • sub-entities like port numbers, URLs and software installed,
  • descriptors of those entities (malicious, physical location information, …) or
  • data which is mostly unstructured data (web page content, port banners, raw DNS records, …)

Here are all the available data elements built into SpiderFoot:

Element IDElement NameElement Data Type
ROOTInternal SpiderFoot Root eventINTERNAL
ACCOUNT_EXTERNAL_OWNEDAccount on External SiteENTITY
ACCOUNT_EXTERNAL_OWNED_COMPROMISEDHacked Account on External SiteDESCRIPTOR
ACCOUNT_EXTERNAL_USER_SHARED_COMPROMISEDHacked User Account on External SiteDESCRIPTOR
AFFILIATE_EMAILADDRAffiliate – Email AddressENTITY
AFFILIATE_INTERNET_NAMEAffiliate – Internet NameENTITY
AFFILIATE_IPADDRAffiliate – IP AddressENTITY
AFFILIATE_WEB_CONTENTAffiliate – Web ContentDATA
AFFILIATE_DOMAINAffiliate – Domain NameENTITY
AFFILIATE_COMPANY_NAMEAffiliate – Company NameENTITY
AFFILIATE_DOMAIN_WHOISAffiliate – Domain WhoisDATA
AFFILIATE_DESCRIPTION_CATEGORYAffiliate Description – CategoryDESCRIPTOR
AFFILIATE_DESCRIPTION_ABSTRACTAffiliate Description – AbstractDESCRIPTOR
APPSTORE_ENTRYApp Store EntryENTITY
CLOUD_STORAGE_BUCKETCloud Storage BucketENTITY
CLOUD_STORAGE_BUCKET_OPENCloud Storage Bucket OpenDESCRIPTOR
COMPANY_NAMECompany NameENTITY
BASE64_DATABase64-encoded DataDATA
BITCOIN_ADDRESSBitcoin AddressENTITY
BITCOIN_BALANCEBitcoin BalanceDESCRIPTOR
BGP_AS_OWNERBGP AS OwnershipENTITY
BGP_AS_MEMBERBGP AS MembershipENTITY
BGP_AS_PEERBGP AS PeerENTITY
BLACKLISTED_IPADDRBlacklisted IP AddressDESCRIPTOR
BLACKLISTED_AFFILIATE_IPADDRBlacklisted Affiliate IP AddressDESCRIPTOR
BLACKLISTED_SUBNETBlacklisted IP on Same SubnetDESCRIPTOR
BLACKLISTED_NETBLOCKBlacklisted IP on Owned NetblockDESCRIPTOR
CO_HOSTED_SITECo-Hosted SiteENTITY
CO_HOSTED_SITE_DOMAINCo-Hosted Site – Domain NameENTITY
CO_HOSTED_SITE_DOMAIN_WHOISCo-Hosted Site – Domain WhoisDATA
DARKNET_MENTION_URLDarknet Mention URLDESCRIPTOR
DARKNET_MENTION_CONTENTDarknet Mention Web ContentDATA
DATE_HUMAN_DOBDate of BirthENTITY
DEFACED_INTERNET_NAMEDefacedDESCRIPTOR
DEFACED_IPADDRDefaced IP AddressDESCRIPTOR
DEFACED_AFFILIATE_INTERNET_NAMEDefaced AffiliateDESCRIPTOR
DEFACED_COHOSTDefaced Co-Hosted SiteDESCRIPTOR
DEFACED_AFFILIATE_IPADDRDefaced Affiliate IP AddressDESCRIPTOR
DESCRIPTION_CATEGORYDescription – CategoryDESCRIPTOR
DESCRIPTION_ABSTRACTDescription – AbstractDESCRIPTOR
DEVICE_TYPEDevice TypeDESCRIPTOR
DNS_TEXTDNS TXT RecordDATA
DNS_SRVDNS SRV RecordDATA
DNS_SPFDNS SPF RecordDATA
DOMAIN_NAMEDomain NameENTITY
DOMAIN_NAME_PARENTDomain Name (Parent)ENTITY
DOMAIN_REGISTRARDomain RegistrarENTITY
DOMAIN_WHOISDomain WhoisDATA
EMAILADDREmail AddressENTITY
EMAILADDR_COMPROMISEDHacked Email AddressDESCRIPTOR
ERROR_MESSAGEError MessageDATA
ETHEREUM_ADDRESSEthereum AddressENTITY
GEOINFOPhysical LocationDESCRIPTOR
HTTP_CODEHTTP Status CodeDATA
HUMAN_NAMEHuman NameENTITY
INTERESTING_FILEInteresting FileDESCRIPTOR
INTERESTING_FILE_HISTORICHistoric Interesting FileDESCRIPTOR
JUNK_FILEJunk FileDESCRIPTOR
INTERNET_NAMEInternet NameENTITY
INTERNET_NAME_UNRESOLVEDInternet Name – UnresolvedENTITY
IP_ADDRESSIP AddressENTITY
IPV6_ADDRESSIPv6 AddressENTITY
LINKED_URL_INTERNALLinked URL – InternalSUBENTITY
LINKED_URL_EXTERNALLinked URL – ExternalSUBENTITY
MALICIOUS_ASNMalicious ASDESCRIPTOR
MALICIOUS_IPADDRMalicious IP AddressDESCRIPTOR
MALICIOUS_COHOSTMalicious Co-Hosted SiteDESCRIPTOR
MALICIOUS_EMAILADDRMalicious E-mail AddressDESCRIPTOR
MALICIOUS_INTERNET_NAMEMalicious Internet NameDESCRIPTOR
MALICIOUS_AFFILIATE_INTERNET_NAMEMalicious AffiliateDESCRIPTOR
MALICIOUS_AFFILIATE_IPADDRMalicious Affiliate IP AddressDESCRIPTOR
MALICIOUS_NETBLOCKMalicious IP on Owned NetblockDESCRIPTOR
MALICIOUS_PHONE_NUMBERMalicious Phone NumberDESCRIPTOR
MALICIOUS_SUBNETMalicious IP on Same SubnetDESCRIPTOR
NETBLOCK_OWNERNetblock OwnershipENTITY
NETBLOCK_MEMBERNetblock MembershipENTITY
NETBLOCK_WHOISNetblock WhoisDATA
OPERATING_SYSTEMOperating SystemDESCRIPTOR
LEAKSITE_URLLeak Site URLENTITY
LEAKSITE_CONTENTLeak Site ContentDATA
PHONE_NUMBERPhone NumberENTITY
PHYSICAL_ADDRESSPhysical AddressENTITY
PHYSICAL_COORDINATESPhysical CoordinatesENTITY
PGP_KEYPGP Public KeyDATA
PROVIDER_DNSName Server (DNS NS Records)ENTITY
PROVIDER_JAVASCRIPTExternally Hosted JavascriptENTITY
PROVIDER_MAILEmail Gateway (DNS MX Records)ENTITY
PROVIDER_HOSTINGHosting ProviderENTITY
PROVIDER_TELCOTelecommunications ProviderENTITY
PUBLIC_CODE_REPOPublic Code RepositoryENTITY
RAW_RIR_DATARaw Data from RIRs/APIsDATA
RAW_DNS_RECORDSRaw DNS RecordsDATA
RAW_FILE_META_DATARaw File Meta DataDATA
SEARCH_ENGINE_WEB_CONTENTSearch Engines Web ContentDATA
SOCIAL_MEDIASocial Media PresenceENTITY
SIMILARDOMAINSimilar DomainENTITY
SIMILARDOMAIN_WHOISSimilar Domain – WhoisDATA
SOFTWARE_USEDSoftware UsedSUBENTITY
SSL_CERTIFICATE_RAWSSL Certificate – Raw DataDATA
SSL_CERTIFICATE_ISSUEDSSL Certificate – Issued toENTITY
SSL_CERTIFICATE_ISSUERSSL Certificate – Issued byENTITY
SSL_CERTIFICATE_MISMATCHSSL Certificate Host MismatchDESCRIPTOR
SSL_CERTIFICATE_EXPIREDSSL Certificate ExpiredDESCRIPTOR
SSL_CERTIFICATE_EXPIRINGSSL Certificate ExpiringDESCRIPTOR
TARGET_WEB_CONTENTWeb ContentDATA
TARGET_WEB_CONTENT_TYPEWeb Content TypeDESCRIPTOR
TARGET_WEB_COOKIECookiesDATA
TCP_PORT_OPENOpen TCP PortSUBENTITY
TCP_PORT_OPEN_BANNEROpen TCP Port BannerDATA
UDP_PORT_OPENOpen UDP PortSUBENTITY
UDP_PORT_OPEN_INFOOpen UDP Port InformationDATA
URL_ADBLOCKED_EXTERNALURL (AdBlocked External)DESCRIPTOR
URL_ADBLOCKED_INTERNALURL (AdBlocked Internal)DESCRIPTOR
URL_FORMURL (Form)DESCRIPTOR
URL_FLASHURL (Uses Flash)DESCRIPTOR
URL_JAVASCRIPTURL (Uses Javascript)DESCRIPTOR
URL_WEB_FRAMEWORKURL (Uses a Web Framework)DESCRIPTOR
URL_JAVA_APPLETURL (Uses Java Applet)DESCRIPTOR
URL_STATICURL (Purely Static)DESCRIPTOR
URL_PASSWORDURL (Accepts Passwords)DESCRIPTOR
URL_UPLOADURL (Accepts Uploads)DESCRIPTOR
URL_FORM_HISTORICHistoric URL (Form)DESCRIPTOR
URL_FLASH_HISTORICHistoric URL (Uses Flash)DESCRIPTOR
URL_JAVASCRIPT_HISTORICHistoric URL (Uses Javascript)DESCRIPTOR
URL_WEB_FRAMEWORK_HISTORICHistoric URL (Uses a Web Framework)DESCRIPTOR
URL_JAVA_APPLET_HISTORICHistoric URL (Uses Java Applet)DESCRIPTOR
URL_STATIC_HISTORICHistoric URL (Purely Static)DESCRIPTOR
URL_PASSWORD_HISTORICHistoric URL (Accepts Passwords)DESCRIPTOR
URL_UPLOAD_HISTORICHistoric URL (Accepts Uploads)DESCRIPTOR
USERNAMEUsernameENTITY
VULNERABILITYVulnerability in Public DomainDESCRIPTOR
WEB_ANALYTICS_IDWeb AnalyticsENTITY
WEBSERVER_BANNERWeb ServerDATA
WEBSERVER_HTTPHEADERSHTTP HeadersDATA
WEBSERVER_STRANGEHEADERNon-Standard HTTP HeaderDATA
WEBSERVER_TECHNOLOGYWeb TechnologyDESCRIPTOR
WIFI_ACCESS_POINTWiFi Access Point NearbyENTITY
WIKIPEDIA_PAGE_EDITWikipedia Page EditDESCRIPTOR

Writing a Module

To write a SpiderFoot module, start by looking at the sfp_template.py file which is a skeleton module that does nothing. Use the following steps as your guide:

  1. Create a copy of sfp_template.py to whatever your module will be named. Try and make this something descriptive, i.e. not something like sfp_mymodule.pybut instead something like sfp_imageanalyser.py if you were creating a module to analyse image content.
  2. Replace XXX in the new module with the name of your module and update the descriptive information in the header and comment within the module.
  3. The comment for the class (check in sfp_template.py) is used by SpiderFoot in the UI to correctly categorise modules, so make it something meaningful. Look at other modules for examples.
  4. Set the events in watchedEvents() and producedEvents() accordingly, based on the data element table in the previous section. If you are producing a new data element not pre-existing in SpiderFoot, you must create this in the database:
    • ~/spiderfoot$ sqlite3 spiderfoot.db sqlite> INSERT INTO tbl_event_types (event, event_descr, event_raw) VALUES ('NEW_DATA_ELEMENT_TYPE_NAME_HERE', 'Description of your New Data Element Here', 0, 'DESCRIPTOR or DATA or ENTITY or SUBENTITY');`
  5. Put the logic for the module in handleEvent(). Each call to handleEvent() is provided a SpiderFootEvent object. The most important values within this object are:
    • eventType: The data element ID (IP_ADDRESSWEBSERVER_BANNER, etc.)
    • data: The actual data, e.g. the IP address or web server banner, etc.
    • module: The name of the module that produced the event (sfp_dnsresolve, etc.)
  6. When it is time to generate your event, create an instance of SpiderFootEvent:
    • e = SpiderFootEvent("IP_ADDRESS", ipaddr, self.__name__, event)
    • Note: the event passed as the last variable is the event that your module received. This is what builds a relationship between data elements in the SpiderFoot database.
  7. Notify all modules that may be interested in the event:
    • self.notifyListeners(e)

Up to table of contents

Database

All SpiderFoot data is stored in a SQLite database (spiderfoot.db in your SpiderFoot installation folder) which can be used outside of SpiderFoot for analysis of your data.

The schema is quite simple and can be viewed in the GitHub repo.

The below queries might provide some further clues:

# Total number of scans in the SpiderFoot database
sqlite> select count(*) from tbl_scan_instance;
10
# Obtain the ID for a particular scan
sqlite> select guid from tbl_scan_instance where seed_target = 'binarypool.com';
b459e339523b8d06235bd06087ae6c6017aaf4ed68dccea0b65a1999a17e460a
# Number of results per data type
sqlite> select count(*), type from tbl_scan_results where scan_instance_id = 'b459e339523b8d06235bd06087ae6c6017aaf4ed68dccea0b65a1999a17e460a' group by type;
5|AFFILIATE_INTERNET_NAME
2|AFFILIATE_IPADDR
1|CO_HOSTED_SITE
1|DOMAIN_NAME
1|DOMAIN_REGISTRAR
1|DOMAIN_WHOIS
1|GEOINFO
28|HTTP_CODE
48|HUMAN_NAME
49|INTERNET_NAME
2|IP_ADDRESS
49|LINKED_URL_EXTERNAL
144|LINKED_URL_INTERNAL
2|PROVIDER_DNS
1|PROVIDER_MAIL
4|RAW_DNS_RECORDS
1|RAW_FILE_META_DATA
1|ROOT
14|SEARCH_ENGINE_WEB_CONTENT
1|SOFTWARE_USED
16|TARGET_WEB_CONTENT
2|TCP_PORT_OPEN
1|TCP_PORT_OPEN_BANNER
1|URL_FORM
10|URL_JAVASCRIPT
6|URL_STATIC
21|URL_WEB_FRAMEWORK
28|WEBSERVER_BANNER
28|WEBSERVER_HTTPHEADERS