@habitualTartare

@habitualTartare@lemmy.world

https://en.wikipedia.org/wiki/Robots.txt

Should cover any polite web crawlers but it is voluntary.

https://platform.openai.com/docs/gptbot

Might have to put it behind a captcha or other type to severely limit automated access.

It’s not realistic to assume it won’t get scraped eventually. Such as someone paying people to bypass capatcha or web crawlers that don’t respect robots.txt. I also don’t know if Google and Microsoft bundle their AI data collection that doesn’t also remove your site from web search.