Why does my target’s sitemap change between scans?
Symptoms
Two or more scans on the same target may show different paths/URLs under the Sitemap section of a scan report. The scans can have the exact same configuration,but the sitemap may still differ across scans.
Root Cause
On Invicti Enterprise, the sitemap section of a scan report is intended to efficiently show paths where vulnerabilities were found. It is a common error to think of the Sitemap feature as a tool for visualizing the nested structure of a target.
However, this is not the intended usage of the sitemap feature as it can list different paths of a target between scans, except for paths where vulnerabilities are found.
The primary reason for this is that the sitemap visualizer tool on Invicti Enterprise considers the responses the scan has received and the order of the requests the scan has sent for a particular path before listing the particular path and relevant sub paths of it on the sitemap of the scan report.
Scenario 1: Scanner receives an early 404 response
Consider the example where the scanner receives a 404 (not found) response for https://example.com/pages before attempting to visit/crawl subpaths under https://example.com/pages. In this scenario, the path https://example.com/pages and paths beneath it will not be listed on the sitemap even though they may be crawled/scanned during the scan.
Scenario 2: Scanner receives an early 200 response
In the scenario where the scanner receives a 200 (OK) response on https://example.com/pages/pages1 before attempting to visit/crawl its parent path https://example.com/pages which results in a 404 (not found) response, the subpath will still be listed on the sitemap.
The key here is the order of requests sent by the scanner. When the scanner sends requests to a subpath before its parent path, the sitemap on the scan report may look differently compared to a scan where the scanner has sent requests to parent paths before subpaths. The order of requests sent for a path may differ for any path in between any scans, and this is implemented in the scanner code base due to performance reasons. The scanner constantly pools URL links (paths) in its cache of links to visit and crawl and may decide to send requests to a subpath before sending them to its parent path or vice versa during a scan.
No Scan Coverage Impact
It should be noted that this functionality does not affect scan coverage. The only exception to the scenarios discussed above would be the presence of a vulnerability on a parent path or a subpath. In such cases, the path with the vulnerability will always be listed on the Sitemap as this is the intended usage for the sitemap feature.
The differences in the sitemap between scans can create confusion regarding scan coverage.
For the reasons discussed above, please use the crawled URL lists of different scans to compare scan coverage between multiple scans.
Invicti Standard Behaviour
Note that Invicti Standard does not follow the rules outlined above, and will always include all paths it has been able to visit regardless of the order of requests it has sent and responses it has received.
Importing a scan from Invicti Enterprise onto Invicti Standard may show a different sitemap compared to the sitemap listed in Invicti Enterprise for this reason.
HTTP Response Codes in the Sitemap
Endpoints not listed
Endpoints that return any of the following HTTP status codes will not be included in the sitemap: 401, 403, 404, 500, 503,
Endpoints listed
Endpoints that return a positive HTTP status code, such as 200, 201, 302 (and others) will be included in the sitemap.