• 0 Posts
  • 14 Comments
Joined 1Y ago
cake
Cake day: Jun 23, 2023

help-circle
rss

robots.txt is 100% honor based. Well known bots like Googlebot, Bingbot, etc. definitely honor them. But there are also plenty of bots that completely ignore them.

I would hope the bots used to collect LLM training data honors them, but there’s no way to know for certain. And all it really takes is one bot ignoring it for the content of your website to end up in a random set of training data…


Try using “curl -A” to specify a User-Agent string that matches Chrome or Firefox.


I hope you realize that virtually every CDN provider does the exact same thing in similar ways. Sites that use Akamai, AWS, Google cloud, Fastly, etc. all give those companies access to unencrypted content. It’s just how CDNs work…



I got royally screwed by the federal government personnel data breach that happened a number of years ago. As a result I have free identity theft monitoring with a really good company for the foreseeable future.


Hate to break it to you but all the major CDN providers do the exact same things. My employer runs multiple websites mainly for US and European users. We use Akamai for both CDN and WAF services. For any CDN and/or WAF to operate properly it needs access to unencrypted content. Part of Akamais WAF tools includes what they call Bot Manager, which can identify traffic coming from over 1000 known bots and can also classify unknown ones. Part of how it works is by browser fingerprinting as well as TLS session fingerprinting and other proprietary fingerprinting.

So any time you visit a large website you’re likely being fingerprinted and otherwise analyzed by the CDN and security tools used by those sites.


If you’re going to attempt this sort of thing then why go through CA or CO? Why not go through a GDPR country directly?


Different tools for different things…

Unlock etc are browser plugins and only block ads in browsers.

pi-hole blocks DNS requests to advertising domains. It blocks ads, tracking data, etc. not only on my browser-based systems, but on other connected devices like smart TVs, media players, etc.



Followed by no JavaScript, no browser plug-ins, etc.


Crooks getting their infallible legal advice from Hollywood screenwriters…


If it truly does have a keylogger then that’s really bad as it means they have access to your passwords and any other sensitive data you might type. How certain are you that it includes a keylogger?


It’s a laptop owned by your school, so they can install spyware if they want to. More importantly the school likely has policies against removing or otherwise tampering with it. You would be wise to find out what they will do if you violate this policy. It could be anything from a slap on the wrist to expulsion.

Any decent IT department will eventually figure out if you disable it. They’ll know fairly quickly if it stops “phoning in” if the spyware is any good.


It’s been great for me. Servers all over the world, connection remains rock solid for me. I also like that they support port forwarding, which can be a godsend for some torrents.