Hate to break it to you but all the major CDN providers do the exact same things. My employer runs multiple websites mainly for US and European users. We use Akamai for both CDN and WAF services. For any CDN and/or WAF to operate properly it needs access to unencrypted content. Part of Akamais WAF tools includes what they call Bot Manager, which can identify traffic coming from over 1000 known bots and can also classify unknown ones. Part of how it works is by browser fingerprinting as well as TLS session fingerprinting and other proprietary fingerprinting.
So any time you visit a large website you’re likely being fingerprinted and otherwise analyzed by the CDN and security tools used by those sites.
It’s a laptop owned by your school, so they can install spyware if they want to. More importantly the school likely has policies against removing or otherwise tampering with it. You would be wise to find out what they will do if you violate this policy. It could be anything from a slap on the wrist to expulsion.
Any decent IT department will eventually figure out if you disable it. They’ll know fairly quickly if it stops “phoning in” if the spyware is any good.
robots.txt is 100% honor based. Well known bots like Googlebot, Bingbot, etc. definitely honor them. But there are also plenty of bots that completely ignore them.
I would hope the bots used to collect LLM training data honors them, but there’s no way to know for certain. And all it really takes is one bot ignoring it for the content of your website to end up in a random set of training data…