Discovery
Turn a handful of in-scope roots into a complete, validated inventory of domains and live hosts.
Discovery is where you expand a short scope list into the real attack surface. A
client hands you example.com and three CIDRs; by the end of discovery you want
every subdomain, every resolving host, and every live IP that belongs to them.
The pipeline is a funnel — each stage produces input for the next, and every
candidate gets filtered against your scope blacklist before it moves forward:
seed roots ─▶ subdomain enumeration ─▶ DNS resolution ─▶ host discovery ─▶ live hosts
│ (passive+active) (which resolve) (which are up)
└──◀── certificate transparency & reverse DNS feed new roots back in ◀──┘
The stages
- Root domain recon — WHOIS, DNS records, org footprint.
- Subdomain enumeration — passive + active + brute.
- DNS resolution — which of those names actually resolve, and to what.
- Certificate & SSL harvesting — pull more names out of TLS certs.
- Host discovery — which IPs are actually alive.
Keep looping: cert harvesting and reverse DNS routinely turn up new root domains.
Add the in-scope ones back to scope-domains.txt and re-run enumeration.
Discovery is “done” when a full loop produces nothing new.
Root domain recon
Before enumerating subdomains, fingerprint each root. It’s cheap, passive, and
orients everything that follows.
while read -r domain; do
mkdir -p "recon/domains/$domain"
whois "$domain" | tee "recon/domains/$domain/whois.txt"
for rec in A MX NS TXT SOA; do
dig +noall +answer "$domain" "$rec"
done | tee "recon/domains/$domain/dig.txt"
# DMARC / SPF often leak infra and partner domains
dig +short TXT "_dmarc.$domain" | tee "recon/domains/$domain/dmarc.txt"
done < scope-domains.txt
What I’m looking for in the output:
- Registrant org / email in WHOIS — pivot to find sibling domains.
- NS / MX — who hosts DNS and mail; hints at cloud vs. on-prem.
- SPF / DMARC TXT records — they frequently list partner and infra domains
worth investigating (and adding to scope if they belong to the client).
If the client owns IP space, look up their ASN (whois -h whois.radb.net <ip>
or bgp.he.net) and pull the announced prefixes.
amass intel -asn <ASN> automates this. The CIDRs you recover become new entries
in scope-ips.txt.
Continue to Subdomain Enumeration.
1 - Subdomain Enumeration
Passive, active, and brute-force discovery of subdomains for each in-scope root.
For each in-scope root domain you want every subdomain you can find. There are
three techniques and they don’t fully overlap — I run all three, because each
one finds names the others miss.
| Technique | Source | Finds |
|---|
| Passive | OSINT APIs, search engines, cert logs | Known/indexed names, zero target traffic |
| Active | DNS queries against the target’s resolvers | Names that resolve but aren’t indexed |
| Brute force | Wordlist against a resolver | Predictable names (dev, vpn, staging) |
Passive: subfinder + amass
I usually start with subfinder (ProjectDiscovery) for passive enum — it’s
fast. amass pulls from a different (overlapping) set of sources, so I run both
and merge the results.
# subfinder against every root at once
subfinder -dL scope-domains.txt -all -silent \
| tee recon/domains/subfinder.txt
# amass passive enum, per root
while read -r domain; do
amass enum -passive -d "$domain" -o "recon/domains/$domain/amass-passive.txt"
done < scope-domains.txt
You’ll get a lot more out of passive sources by adding API keys (Censys,
SecurityTrails, Shodan, VirusTotal, GitHub, etc.) to
~/.config/subfinder/provider-config.yaml and ~/.config/amass/config.ini.
Keyed sources roughly double the yield.
Active: amass
Active enumeration resolves and validates names against the target’s own DNS,
catching wildcards and names that exist but aren’t in any OSINT feed.
while read -r domain; do
amass enum -active -d "$domain" -o "recon/domains/$domain/amass-active.txt"
done < scope-domains.txt
Brute force: predictable names
Brute forcing throws a wordlist of common labels at the domain. I brute with a
dedicated resolver tool (puredns or
dnsx, covered on the DNS Resolution page) rather than
amass’s built-in brute, because you control the rate and the resolver quality:
# Generate candidates from a wordlist, then resolve them
dnsx -d example.com -w /opt/SecLists/Discovery/DNS/subdomains-top1million-110000.txt \
-silent -o recon/domains/example.com/brute.txt
amass enum -brute -d example.com does the same thing in one shot if you’d
rather keep it simple.
Certificate transparency (crt.sh)
Public CT logs are one of the best passive sources — every TLS cert a host has
ever requested is logged with its names. Query directly:
curl -s "https://crt.sh/?q=%25.example.com&output=json" \
| jq -r '.[].name_value' \
| sed 's/^\*\.//' | tr 'A-Z' 'a-z' | sort -u \
| tee recon/domains/example.com/crtsh.txt
Merge, validate, and scope
Combine every source, strip the noise, and filter against your blacklist. This
is the step that keeps you in scope:
cat recon/domains/example.com/*.txt recon/domains/subfinder.txt \
| sed 's/^\*\.//;s/\.$//' | tr 'A-Z' 'a-z' \
| grep -E '^[a-z0-9_.-]+$' \
| grep -E '\.example\.com$' \
| grep -vEf blacklist.txt \
| sort -u \
| tee recon/domains/example.com/subdomains.txt
anew is handy here — it appends only
new lines to a file and prints them, so you can see what each run adds:
subfinder -d example.com -silent | anew recon/domains/example.com/subdomains.txt
The output subdomains.txt is the input to DNS Resolution,
where you find out which of these names are actually live.
2 - DNS Resolution
Resolve enumerated names to IPs at scale, separate live from dead, and turn resolved addresses into IP scope.
Enumeration gives you a list of candidate names. Most engagements need to know
which ones actually resolve, what they resolve to, and which IPs that adds to
scope. It’s a mass-resolution problem — you can easily end up with tens of
thousands of candidate names.
Mass resolution with dnsx
Arsenic originally used fast-resolv for this. These days I use
dnsx (ProjectDiscovery) — it
resolves huge lists quickly against a pool of resolvers and it’s actively
maintained.
# Resolve every discovered subdomain; keep only those that answer, with their A records
dnsx -l recon/domains/example.com/subdomains.txt \
-a -resp \
-silent \
-o recon/domains/example.com/resolved.txt
Use a curated resolver list to avoid poisoned or rate-limited public resolvers —
dnsvalidator builds one:
dnsvalidator -tL https://public-dns.info/nameservers.txt -threads 100 -o resolvers.txt
dnsx -l subdomains.txt -r resolvers.txt -a -resp -silent -o resolved.txt
Watch out for wildcard DNS. Some domains resolve everything to one IP
(*.example.com → 203.0.113.9). dnsx has -wd example.com for wildcard
filtering; puredns handles it automatically. Without it, your “resolved” list
is mostly garbage.
Every resolved address that falls inside your authorized ranges becomes part of
the IP scope for the recon phase:
# Pull the unique IPs out of the resolved output
grep -oE '\[([0-9]{1,3}\.){3}[0-9]{1,3}\]' recon/domains/*/resolved.txt \
| tr -d '[]' | sort -u \
| tee recon/ips/from-domains.txt
# Merge with seed IP scope, filtering to authorized ranges
cat scope-ips.txt recon/ips/from-domains.txt | sort -u > recon/ips/scope-combined.txt
Reverse DNS (the other direction)
You also want to resolve IPs back to names — PTR records often reveal hostnames
(and therefore new domains) you’d never have guessed:
dnsx -l recon/ips/scope-combined.txt -ptr -resp-only -silent \
| tr 'A-Z' 'a-z' | sort -u \
| grep -vEf blacklist.txt \
| tee recon/ips/ptr-names.txt
Any in-scope root domains that show up here go back into scope-domains.txt,
and you re-run enumeration. That’s the discovery loop
closing on itself.
Next: Certificate & SSL Harvesting for one more rich
source of hostnames, then Host Discovery to find which IPs are
alive.
3 - Certificate & SSL Harvesting
Pull hostnames out of live TLS certificates to find assets nothing else surfaces.
TLS certificates are full of hostnames. A cert’s Common Name (CN) and
Subject Alternative Names (SANs) list every name the operator put on it —
including internal names, dev hosts, and sibling domains that never show up in
DNS enumeration or OSINT.
There are two angles: passively reading certificate transparency logs
(covered on the Subdomain Enumeration
page) and actively grabbing certs off live hosts. This page is the active
side. It’s worth doing because it catches certs that were never logged to CT and
certs served directly on IPs with no DNS name at all.
Harvest certs from hosts and IPs
Run an nmap service scan against the TLS ports and let the ssl-cert script dump
the certificate details, then parse out the names. This is what Arsenic’s
as-domains-from-*-ssl-certs scripts do:
# Scan TLS ports on your resolved hosts (and on bare IPs)
nmap -p 443,8443,993,995,8080,8843 -sV -sC --open \
-iL recon/ips/scope-combined.txt \
-oA recon/ips/nmap-tls-check
# Extract CN + SAN entries from the nmap output
{
grep -ohP 'commonName=\K.+' recon/ips/nmap-tls-check.nmap
grep -ohP 'Subject Alternative Name: DNS:\K.+' recon/ips/nmap-tls-check.nmap \
| sed 's/ DNS://g; s/,/\n/g'
} \
| sed 's/^\*\.//' | tr 'A-Z' 'a-z' \
| grep '\.' \
| grep -vEf blacklist.txt \
| sort -u \
| tee recon/ips/ssl-cert-domains.txt
One-liner with httpx
httpx can grab and parse certs in
one pass — faster than nmap when you only care about the names:
httpx -l recon/ips/scope-combined.txt \
-p 443,8443,8080,8843 \
-tls-grab -json -silent \
| jq -r '.tls.subject_an[]?, .tls.subject_cn?' \
| sed 's/^\*\.//' | tr 'A-Z' 'a-z' | sort -u \
| grep -vEf blacklist.txt \
| tee recon/ips/ssl-cert-domains.txt
Feed it back into scope
In-scope names that came out of certs are new subdomains/roots:
grep -E '\.(example\.com|example\.net)$' recon/ips/ssl-cert-domains.txt \
| anew scope-domains-generated.txt
Then loop back to DNS Resolution to resolve the new names.
Once a full discovery loop yields nothing new, move on to
Host Discovery.
4 - Host Discovery
Find which IPs in scope are actually alive before you spend time on full port scans.
You may have hundreds or thousands of IPs in scope, especially after expanding
CIDRs. Full port scanning all of them is wasteful — most won’t be up. Host
discovery is a fast first pass to find the live ones, so the expensive
recon phase only targets hosts that exist.
Expand CIDRs to addresses
First, turn any CIDR ranges into individual addresses so you can scan and track
them per-host. nmap -sL (“list scan”) expands ranges without sending a single
packet:
# IPv4
nmap -sL -n -iL recon/ips/scope-combined.txt \
| awk '/report for/{print $NF}' \
| sort -u > recon/ips/expanded-ipv4.txt
# IPv6 (if in scope)
nmap -6 -sL -n -iL recon/ips/scope-combined.txt \
| awk '/report for/{print $NF}' \
| sort -u > recon/ips/expanded-ipv6.txt
Smart ping sweep with nmap
A plain ICMP ping sweep misses hosts that block ICMP, which is most hardened
hosts. The trick Arsenic uses is to probe the most popular ports for liveness
on top of ICMP, so a host that drops ping but answers on tcp/443 still shows up.
Build the popular-port lists straight from nmap’s own frequency data:
TOP=30 # top-N most common ports
TCP=$(sort -r -k3 /usr/share/nmap/nmap-services | awk '/\/tcp/{print $2}' \
| cut -d/ -f1 | head -n $TOP | paste -sd,)
UDP=$(sort -r -k3 /usr/share/nmap/nmap-services | awk '/\/udp/{print $2}' \
| cut -d/ -f1 | head -n $TOP | paste -sd,)
Then sweep with multiple probe types — ICMP echo + timestamp, TCP ACK/SYN to the
popular ports, and UDP to its popular ports:
sudo nmap -sn -n \
-PE -PP \
-PA"$TCP" -PS"$TCP" -PU"$UDP" \
--randomize-hosts --scan-delay 50ms \
-T4 \
-iL recon/ips/expanded-ipv4.txt \
-oA recon/ips/host-discovery-ipv4
# Extract the live hosts
awk '/Up$/{print $2}' recon/ips/host-discovery-ipv4.gnmap \
| sort -u > recon/ips/alive.txt
What the flags do:
-sn — host discovery only, no port scan.-PE -PP — ICMP echo + timestamp requests.-PA<ports> / -PS<ports> — TCP ACK / SYN probes to popular ports (gets
through stateful firewalls that drop ICMP).-PU<ports> — UDP probes.--randomize-hosts / --scan-delay — a little quieter and gentler.-T4 — timing; drop to -T3 or lower for fragile/monitored networks.
Faster alternative: naabu
naabu can do liveness + a fast
port pass in one step, and it’s nice for large ranges:
naabu -l recon/ips/expanded-ipv4.txt -top-ports 100 -silent \
| cut -d: -f1 | sort -u | tee recon/ips/alive.txt
Resolve names ↔ live IPs
Cross-reference your resolved domains with the live IP list so you know which
hostnames sit on which live host. One IP often serves many vhosts — you want to
scan the IP once but remember every name pointing at it (it matters for HTTP
vhost routing in recon).
The output of this phase — recon/ips/alive.txt plus the per-host name mapping —
is the target list for Recon.