Robots.txt Detected
Invicti detected a Robots.txt
file with potentially sensitive content.
Robots.txt
, and ensure they are correctly protected by means of authentication.Robots.txt
is only used to instruct search robots which resources should be indexed and which ones are not.
User-Agent: *
Allow: /web/
Disallow: /
Please note that when you use the instructions above, search engines will not index your website except for the specified directories.
If you want to hide certain section of the website from the search engines X-Robots-Tag
can be set in the response header to tell crawlers whether the file should be indexed or not:
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
By using X-Robots-Tag
you don't have to list the these files in your Robots.txt
.
It is also not possible to prevent media files from being indexed by putting using Robots Meta Tags. X-Robots-Tag
resolves this issue as well.
For Apache, the following snippet can be put into httpd.conf
or an .htaccess
file to restrict crawlers to index multimedia files without exposing them in Robots.txt
<Files ~ ".pdf$">
# Don't index PDF files.
Header set X-Robots-Tag "noindex, nofollow"
</Files>
<Files ~ ".(png|jpe?g|gif)$">
#Don't index image files.
Header set X-Robots-Tag "noindex"
</Files>