Content Discovery

Fuzz web roots for hidden directories, files, and endpoints the app doesn’t link to.

Apps expose far more than their navigation shows: /admin, /.git/, /backup.zip, /api/v1, /.env, old /test.php files. Content discovery brute-forces paths against a wordlist to find them. Run it against every live web service from HTTP probing.

Pick a fuzzer

Arsenic supports gobuster, dirb, and ffuf, defaulting to ffuf. The two I actually use:

  • ffuf — fast, flexible, good filtering; my default.
  • feroxbuster — recursive by default, nice for deep trees.

Wordlists

SecLists is where I pull wordlists from. A solid general-purpose stack (this mirrors Arsenic’s default web-content set):

Discovery/Web-Content/common.txt
Discovery/Web-Content/raft-medium-words.txt
Discovery/Web-Content/raft-large-directories.txt
Discovery/Web-Content/quickhits.txt
Discovery/Web-Content/RobotsDisallowed-Top1000.txt

Build a combined, de-duplicated list once:

cat /opt/SecLists/Discovery/Web-Content/{common,raft-medium-words,quickhits}.txt \
  | sort -u > recon/wordlist-web-content.txt

Tailor it to the tech you fingerprinted: a Tomcat box gets tomcat.txt, a Jenkins box gets Jenkins-Hudson.txt, and so on.

Run ffuf

url="https://app.example.com"
host=app.example.com
mkdir -p "hosts/$host/recon"

ffuf -u "$url/FUZZ" \
     -w recon/wordlist-web-content.txt \
     -ac \
     -mc all -fc 404 \
     -recursion -recursion-depth 2 \
     -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" \
     -of json -o "hosts/$host/recon/ffuf.json"

What the flags do (these match the Arsenic as-ffuf defaults):

  • -acauto-calibration: ffuf learns the “not found” response shape and filters it automatically. You need this, or you drown in false positives.
  • -mc all -fc 404 — match everything, filter out 404s. Lets you see 401/403 (exists but protected) and 500 (something broke = interesting).
  • -recursion -recursion-depth 2 — dig into discovered directories.
  • -e .php,.bak,.zip,.txt — add extension fuzzing when you know the stack.

Tune signal, not noise

Auto-calibration handles most of the junk, but apps that return 200 for everything need manual filtering. Inspect the size/word/line distribution and filter the dominant bucket:

# How many results per status code?
jq '.results[].status' hosts/app.example.com/recon/ffuf.json | sort | uniq -c

# Filter by response size if a wildcard 200 is flooding results
ffuf -u "$url/FUZZ" -w wordlist.txt -fs 1234   # filter that exact size

Arsenic’s as-prune-ffuf does exactly this after the fact — trimming the dominant status/size bucket out of a bloated results file so what’s left is signal.

What to chase

From the results, prioritize:

  • Auth panels & admin paths (/admin, /manager, /wp-admin).
  • Source/secrets leakage (/.git/, /.env, /config.php.bak, /backup/).
  • APIs (/api, /swagger, /graphql) — often under-protected.
  • Anything 403 — it exists and someone tried to hide it.

Discovered endpoints and the technologies you fingerprinted both feed the Vulnerability Hunting phase.

Last modified July 4, 2026: Post/mobi (#71) (ff64902)