A place to discuss privacy and freedom in the digital world.
Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.
In this community everyone is welcome to post links and discuss topics related to privacy.
Some Rules
- Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
- Don’t promote proprietary software
- Try to keep things on topic
- If you have a question, please try searching for previous discussions, maybe it has already been answered
- Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
- Be nice :)
Related communities
Chat rooms
much thanks to @gary_host_laptop for the logo design :)
- 0 users online
- 57 users / day
- 383 users / week
- 1.5K users / month
- 5.7K users / 6 months
- 1 subscriber
- 2.97K Posts
- 74.6K Comments
- Modlog
Anyone have an eli5 explanation of how AITA works? What patterns could be captured and how would that lead to identification or data siphoning?
One example:
By observing that when someone visits site X, it loads resources A, B, C, etc in a specific order with specific sizes, then with enough distinguishable resources loaded like that someone would be able to determine that you’re loading that site, even if it’s loaded inside a VPN connection. Think about when you load Lemmy.world, it loads the main page, then specific images and style sheets that may be recognizable sizes and are generally loaded in a particular order as they’re encountered in the main page, scripts, and things included in scripts. With enough data, instead of writing static rules to say x of size n was loaded, y of size m was loaded, etc, it can instead be used with an AI model trained on what connections to specific sites typically look like. They could even generate their own data for sites in both normal traffic and the VPN encrypted forms and correlate them together to better train their model for what it might look like when a site is accessed over a VPN. Overall, AI allows them to simplify and automate the identification process when given enough samples.
Mullvad is working on enabling their VPN apps to: 1. pad the data to a single size so that the different resources are less identifiable and 2. send random data in the background so that there is more noise that has to be filtered out when matching patterns. I’m not sure about 3 to be honest.
Thanks very much, I believe I understand that part now, like a fingerprint to associate to site components like pulled in js, css, etc. I still don’t understand, though, how they associate that to a particular user of a VPN. Does each request done through a VPN include some sort of identifier for each of us or is AI also doing something to put these requests in a particular user’s bucket?
I think it was more targeting the client ISP side, than the VPN provider side. So something like having your ISP monitor your connection (voluntarily or forced to with a warrant/law) and report if your connection activity matches that of someone accessing a certain site that your local government might not like for example. In that scenario they would be able to isolate it to at least individual customer accounts of an ISP, which usually know who you are or where to find you in order to provide service. I may be misunderstanding it though.
Edit: On second reading, it looks like they might just be able to buy that info directly from monitoring companies and get much of what they need to do correlation at various points along a VPN-protected connection’s route. The Mullvad post has links to Vice articles describing the data that is being purchased by governments.