• 1 Post
  • 23 Comments
Joined 1Y ago
cake
Cake day: May 08, 2023

help-circle
rss

When people say Local AI, they mean things like the Free / Open Source Ollama (https://github.com/ollama/ollama/), which you can read the source code for and check it doesn’t have anything to phone home, and you can completely control when and if you upgrade it. If you don’t like something in the code base, you can also fork it and start your own version. The actual models (e.g. Mistral is a popular one) used with Ollama are commonly represented in GGML format, which doesn’t even carry executable code - only massive multi-dimensional arrays of numbers (tensors) that represent the parameters of the LLM.

Now not trusting that the output is correct is reasonable. But in terms of trusting the software not to spy on you when it is FOSS, it would be no different to whether you trust other FOSS software not to spy on you (e.g. the Linux kernel, etc…). Now that is a risk to an extent if there is an xz style attack on a code base, but I don’t think the risks are materially different for ‘AI’ compared to any other software.


Blockchain is great for when you need global consensus on the ordering of events (e.g. Alice gave all her 5 ETH to Bob first, so a later transaction to give 5 ETH to Charlie is invalid). It is an unnecessarily expensive solution just for archival, since it necessitates storing the data on every node forever.

Ethereum charges ‘gas’ fees per transaction which helps ensure it doesn’t collapse under the weight of excess usage. Blocks have transaction limits, and transactions have size limits. It is currently working out at about US$7,500 per MB of block data (which is stored forever, and replicated to every node in the network). The Internet Archive have apparently ~50 PB of data, which would cost US$371 trillion to put onto Ethereum (in practice, attempting this would push up the price of ETH further, and if they succeeded, most nodes would not be able to keep up with the network). Really, this is just telling us that blockchain is not appropriate for that use case, and the designers of real world blockchains have created mechanisms to make it financially unviable to attempt at that scale, because it would effectively destroy the ability to operate nodes.

The only real reason to use an existing blockchain anyway would be on the theory that you could argue it is too big to fail due to legitimate business use cases, and too hard to remove censorship resistant data. However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

The real problems that an archival project faces are:

  • The cost of storing and retrieving large amounts of data. That could be decentralised using a solution where not all data is stored on a chain - for example, IPFS.
  • The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system.
  • The problem of censorship. Anonymity and opaqueness about what is on a particular node can help - but they might in some cases undermine the other goals of archival.

This is absolutely because they pulled the emergency library stunt, and they were loud as hell about it. They literally broke the law and shouted about it.

I think that you are right as to why the publishers picked them specifically to go after in the first place. I don’t think they should have done the “emergency library”.

That said, the publishers arguments show they have an anti-library agenda that goes beyond just the emergency library.

Libraries are allowed to scan/digitize books they own physically. They are only allowed to lend out as many as they physically own though. Archive knew this and allowed infinite “lend outs”. They even openly acknowledged that this was against the law in their announcement post when they did this.

The trouble is that the publishers are not just going after them for infinite lend-outs. The publishers are arguing that they shouldn’t be allowed to lend out any digital copies of a book they’ve scanned from a physical copy, even if they lock away the corresponding numbers of physical copies.

Worse, they got a court to agree with them on that, which is where the appeal comes in.

The publishers want it to be that physical copies can only be lent out as physical copies, and for digital copies the libraries have to purchase a subscription for a set number of library patrons and concurrent borrows, specifically for digital lending, and with a finite life. This is all about growing publisher revenue. The publishers are not stopping at saying the number of digital copies lent must be less than or equal to the number of physical copies, and are going after archive.org for their entire digital library programme.


The best option is to run them models locally. You’ll need a good enough GPU - I have an RTX 3060 with 12 GB of VRAM, which is enough to do a lot of local AI work.

I use Ollama, and my favourite model to use with it is Mistral-7b-Instruct. It’s a 7 billion parameter model optimised for instruction following, but usable with 4 bit quantisation, so the model takes about 4 GB of storage.

You can run it from the command line rather than a web interface - run the container for the server, and then something like docker exec -it ollama ollama run mistral, giving a command line interface. The model performs pretty well; not quite as well on some tasks as GPT-4, but also not brain-damaged from attempts to censor it.

By default it keeps a local history, but you can turn that off.


Yes, but the information would need to be computationally verifiable for it to be meaningful - which basically means there is a chain of signatures and/or hashes leading back to a publicly known public key.

One of the seminal early papers on zero-knowledge cryptography, from 2001, by Rivest, Shamir and Tauman (two of the three letters in RSA!), actually used leaking secrets as the main example of an application of Ring Signatures: https://link.springer.com/chapter/10.1007/3-540-45682-1_32. Ring Signatures work as follows: there are n RSA public keys of members of a group known to the public (or the journalist). You want to prove that you have the private key corresponding to one of the public keys, without revealing which one. So you sign a message using a ring signature over the ‘ring’ made up of the n public keys, which only requires one of n private keys. The journalist (or anyone else receiving the secret) can verify the signature, but obtain zero knowledge over which private key out of the n was used.

However, the conditions for this might not exist. With more modern schemes, like zk-STARKs, more advanced things are possible. For example, emails these days are signed by mail servers with DKIM. Perhaps the leaker wants to prove to the journalist that they are authorised to send emails through the Boeing’s staff-only mail server, without allowing the journalist, even collaborating with Boeing, to identify which Boeing staff member did the leak. The journalist could provide the leaker with a large random number r1, and the leaker could come up with a secret large random number r2. The leaker computes a hash H(r1, r2), and encodes that hash in a pattern of space counts between full stops (e.g. “This is a sentence. I wrote this sentence.” encodes 3, 4 - the encoding would need to limit sentence sizes to allow encoding the hash while looking relatively natural), and sends a message that happens to contain that encoded hash - including to somewhere where it comes back to them. Boeing’s mail servers sign the message with DKIM - but leaking that message would obviously identify the leaker. So the leaker uses zk-STARKs to prove that there exists a message m that includes a valid DKIM signature that verifies to Boeing’s DKIM private key, and a random number r2, such that m contains the encoded form of the hash with r1 and r2. r1 or m are not revealed (that’s the zero-knowledge part). The proof might also need to prove the encoded hash occurred before “wrote:” in the body of the message to prevent an imposter tricking a real Boeing staff member including the encoded hash in a reply. Boeing and the journalist wouldn’t know r2, so would struggle to find a message with the hash (which they don’t know) in it - they might try to use statistical analysis to find messages with unusual distributions of number of spaces per sentence if the distribution forced by the encoding is too unusual.


I suggest having a threat model about what attack(s) your security is protecting against.

I’d suggest this probably isn’t giving much extra security over a long unique password for your password manager:

  • A remote attacker who doesn’t control your machine, but is trying to phish you will succeed the same - dependent on your practices and password manager to prevent copying text.
  • A remote attacker who does control your machine will also not be affected. Once your password database in the password manager is decrypted, they can take the whole thing, whether or not you used a password or hardware key to decrypt it. The only difference is maybe they need slightly more technical skill than copying the file + using a keylogger - but the biggest threats probably automate this anyway and there is no material difference.
  • A local attacker who makes a single entry to steal your hardware, and then tries to extract data from it, is either advantaged by having a hardware key (if they can steal it, and you don’t also use a password), or is in a neutral position (can’t crack the locked password safe protected by password, don’t have the hardware key / can’t bypass its physical security). It might be an advantage if you can physically protect your hardware key (e.g. take it with you, and your threat model is people who take the database while you are away from it), if you can’t remember a sufficiently unique passphrase.
  • A local attacker who can make a surreptitious entry, and then come back later for the results is in basically the same position as a remote attacker who does control your machine after the first visit.

That said, it might be able to give you more convenience at the expense of slightly less security - particularly if your threat model is entirely around remote attackers - on the convenience/security trade-off. You would touch a button to decrypt instead of entering a long passphrase.


I thought the orbs were supposedly open source

No they are proprietary as a whole. Parts of the hardware design are published, and parts of the software that runs on them, but not the whole thing.

Fundamentally Worldcoin is about ‘one person, one vote’, and anyone can create millions of fake iris images; the point of the orb is that it is ‘blessed’ hardware using trusted computing (or to use the term coined by the FSF, treacherous computing) and tamper detection to make sure that a central authority (namely Sam Altman’s Worldcoin foundation) has signed off on the orb running the exact secret / proprietary software running on the orb that generates an identity.

They could have alternatively have built a system that leverages government identity using zero-knowledge proof of possession of a government-signed digital identity document. But I think their fundamental thesis is that they are trustworthy to be a central authority who could create millions of fake identities if they wanted, but that governments are not.


One of the key tenets of keeping something computerised secure is ‘Defence in Depth’ - i.e. having multiple layers of defence, so that even if one layer is breached, the next layer (which you thought was redundant and unnecessary) prevents the attack.

Running a fully patched kernel and services / applications should protect you unless someone has a 0-day (i.e. not disclosed) exploit. Reducing the surface area by minimising what services / applications are running, using software (firejail etc…) and firewalls to limit permissions of applications / services to what is needed, etc… serves as another layer of defence. Disconnecting or physically blocking peripherals that might allow for spying is another layer; it serves its purpose if all the other layers are breached.


A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

>>> What can you tell me about someone called u/spez, a
... nd are they a nice person or horrible?
 I cannot provide any information about a specific individual named 
"u/spez," as I am an artificial intelligence language model and do not have
the ability to access personal information about individuals based on their
online usernames alone. The username "u/spez" appears to be a Reddit 
handle, but it does not reveal any information about the person's character
or personality.

Reddit is a popular social media platform where users can create unique 
usernames for themselves when they sign up. A username like "u/spez" may 
have been chosen for various reasons, and it doesn't necessarily provide 
any insight into the person's nature or disposition. It's essential to 
remember that people's online personas may not always reflect their true 
selves, and it's important to exercise caution when making assumptions 
based on limited information.

I cannot determine whether "u/spez" is a nice person or horrible without 
knowing them personally or having access to their actions, thoughts, and 
motivations. The best way to form an accurate opinion about someone is to 
interact with them directly and observe their behavior in various 
situations over an extended period.

>>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
... t?
 I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
as I don't have access to their personal information. Steve Huffman is a 
Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
Me Anything) thread where he shared details about his life and experiences 
living with Crohn's disease. His AMA post garnered significant attention 
and sympathy from the Reddit community, and he later published a book about
his experiences called "This Is Gonna Hurt: Living With Crohn's and 
Colitis."

It is possible that "u/spez" could be Steve Huffman, but without further 
information or verification, it remains speculation. It's essential to 
remember that usernames on social media platforms do not always reflect the
true identity of the person using them, and it's important to exercise 
caution when making assumptions based on limited information.

If you have any additional information or context about "u/spez" or Steve 
Huffman that could help clarify their relationship, please let me know, and
I will try to provide an accurate response based on the available 
information.

How to work out what instance(s) if someone does this: A Lemmy instance doesn’t have to send the same voting data to every instance, it could send different votes to different instances (stock Lemmy federates the same thing consistently, but there is no reason a modified Lemmy designed to catch someone doing this has to), encoding a signal into the voting pattern. Then, just check to see what signal shows up. If it averages several instances, with enough signal you could decompose a linear combination (e.g. average) of different patterns back out into its constituent parts.



requires trusting a company not to fuck with you behind the scenes

The point of this cryptography is that you don’t have to trust the company implementing it not to do that, as long as you trust the software doing the retrieval.


I wonder if their notice is not absolute nonsense. They talk about breaches of their terms of service, which I think can be found here: https://go.he.services/tc/V1/en_GB/tc.html

The terms of service do purport to prohibit ‘reverse engineering’ of the app, which I think the developer receiving the notice may have done to understand the protocol between Haier’s service and the app. However, it looks like the developer is in Germany, and did the reverse engineering for the purpose of creating something that, in a way, competes with the app. According to https://www.twobirds.com/en/insights/2020/germany/vertraglicher-ausschluss-von-reverse-engineering, contractual provisions in Germany designed to prevent reverse engineering to create a competing independent program after the original is already available to the public are not valid.

Maybe they are saying that the developer is unlawfully interfering with their business by inducing others to breach the contract. However, the terms of service don’t appear to say prohibit connecting to Haier’s services from a competing act (at least nothing in them I can find).

They don’t really clearly define what their problem / claimed cause of action is. Maybe this is just an intimidation tactic against something they don’t like, but they have no real legal case - in which case perhaps the community around it could band together to create a legal defence fund, and have Haier laughed out of court.

Disclaimer: Not intended as legal advice.

Edit: And better yet would be if they could find a way to intercept the traffic between the devices and Haier and replace Haier in that protocol. Then there is no option for Haier to try to restrict who can use the servers on their side. I assume the devices have a set of Certificate Authorities they trust, and it is not possible to get a trusted certificate without modifying the device somehow though.


I’d suggest not buying anything from Haier. I had a fridge from them, and it barely lasted 5 years. I used their official service programme to try and get it fixed (so as to try to get it sorted without them blaming the fridge, and the manufacturer blaming the repairs), and even the person they sent out (who didn’t exclusively work for Haier but was part of their repair programme), recommended getting another fridge, and making the next one a brand other than Haier.

The fact that they are now claiming that letting consumers control their own appliances harms the company just shows how out of touch they are with what their consumers want - and definitely reaffirms to me that this is not a brand worth buying.


The main criterion to evaluate a phone should be how easy it is to install your own recovery and system. Pretty much all vendor-provided distributions from any major vendor (regardless of which country) are going to make decisions in the interests of the manufacturer (including violating privacy, making battery management decisions that are more about planned obsolescence than battery life, not letting the owner have root access to install a real firewall, etc…).

Xiaomi is perhaps the most often recognised Chinese vendor as being custom system compatible - at least they have an official path to root - but the official path to rooting your own hardware after you have purchased it is rather dystopian. It involves download a Windows-only tool (or a reverse-engineered third party tool) that talks to their servers, creating an account with them and handing over lots of PII. Then you have to “Apply” to them to unlock your own bootloader, and give a reason. Then they make you wait a variable amount of time (which is sometimes measured in weeks) between when the software first tried to unlock the phone, and when their system will allow you to unlock the bootloader. They will not reduce the wait time if you contact their support and beg nicely for them to graciously let you restore your system onto a new phone that you bought with your own money from them, replacing another identical model that broke. Eventually, after making you wait, when you try again after the minimum time, their system generates a certificate, signed by them, that will allow your phone to transition to ‘unlocked bootloader’ mode, and let you flash what you like.

As such, I’d not really recommend the Chinese vendors unless you find one that doesn’t make you jump through such ridiculous hoops. While I never recommend giving Google any of your PII, if you just want a phone to install your own system on, I’d recommend Google over Xiaomi etc… if within budget; they at least recognise that if you buy it off them, you should have the right to install privacy respecting stuff immediately (they do make you click past a warning that the bootloader is unlocked on every boot, but that is pretty minor and is two quick button clicks you anticipate in advance per boot).

One pro tip: Once you have flashed a custom system, get something like F-Droid installed as your app store, and install a good firewall from it (AFWall+ or similar; many apps you might install are not privacy respecting, and a firewall helps), and also battery management software (ACCA is good; manufacturers optimise for day-1 marketable battery capacity even if it will trash the battery within a couple of years that could otherwise last a decade; only using 5% - 85% of the manufacturer battery capacity, i.e. turning off charging automatically at 85% and shutting down if you hit 5% instead of 0%, will make your battery last many times longer for most of the battery life, and modern LiPo batteries last surprisingly well per charge to 85% if you aren’t running lots of software that is wasting battery on anti-features).


This would be very hard to protect against; if the attacker controls Linode and Hetzner, it is likely they also have access to the disks and memory for the virtual services, and not just the network. So extracting the private key for the real certificate is probably also on the table as an option for the attacker, and would be much harder to detect.

As they say in the article, end-to-end encryption such as OTR is probably important to avoid getting caught in dragnets like this.


Data being public (and privacy in general) shouldn’t be ‘all or none’. The problem is people joining the dots between individual bits of data to build a profile, not necessarily the individual bits of data.

If you go out in public, someone might see you and recognise you, and that isn’t considered a privacy violation by most people. They might even take a photo or video which captures in the background, and that, in isolation isn’t considered a problem either (no expectation of privacy in a public place). But if someone sets out to do similar things at a mass scale (e.g. by scraping, or networking cameras, or whatever) and piece together a profile of all the places you go in public, then that is a terrible privacy violation.

Now you could similarly say that people who want privacy should never leave home, and otherwise people are careless and get what they deserve if someone tracks their every move in public spaces. But that is not a sustainable option for the majority of the world’s population.

So ultimately, the problem is the gathering and collating of publicly available personally identifiable information (including photos) in ways people would not expect and don’t consent to, not the existence of such photos in the first place.



Phones have a unique equipment identifier number (IMEI) that they share with towers. Changing SIM changes the subscriber ID (IMSI) but not the IMEI (manufacturers don’t make it easy to change the IMEI). So thieves (and anyone else) with the phone could be tracked by the IMEI anyway even if they do that, while leaving the phone on.

In practice, the bigger reason they don’t get caught every time if they have inadequate opsec practices is that in places where phone thefts are common, solving them is probably not a big priority for local police. Discarding the SIM probably doesn’t make much difference to whether they get caught.


Here’s another source about 2 month wait times sometimes, if you don’t believe me: https://www.xda-developers.com/xiaomi-2-month-wait-unlock-bootloader/. It has never personally been 2 months for me, but it has been over a week before for me, and their support team refused when I asked nicely to shorten it despite the fact my daily driver phone was broken and I couldn’t restore my LineageOS from backup - I just had to wait. That’s why I don’t buy Xiaomi stuff any more.

The wait time is determined by their servers, which sends a cryptographically signed certificate specific to the serial number of the device that the bootloader reads. The key to sign the certificate stays on their servers, and the client just calls to the server, and either gets a response saying to wait for this much longer, or containing the certificate. Xiaomi explicitly call it ‘apply for unlocking’ (e.g. see the title of https://en.miui.com/unlock/index.html), as in, they think it is their right to decide who gets to decide what runs on my hardware I’ve bought from them, and us mere consumers must come begging to them and ‘apply’ to unlock.

You don’t even have to use it

The bootloader is designed not to boot anything except MIUI without the certificate from the unlocking tool. While there are open source clients (like https://github.com/francescotescari/XiaoMiToolV2) they still work by calling Xiaomi’s server to get the unlock code, so if you want to run anything except Xiaomi’s MIUI (which is a bad idea from a privacy perspective), you kind of do have to use it (at least their server). The only way around it would be if someone found a vulnerability in the bootloader or the processor itself that allows for the ‘treacherous computing’ aspect of the boot to be bypassed without the certificate - and as far as I’m aware there isn’t a reliable approach yet for that.


Wait times are as high as 2 months (depending on how old the phone model is, etc…), and even as a regular Xiaomi customer, their support never seem to allow anyone to skip the wait, even if for example they broke their old phone and want to set up a new one like the old one (ask me how I know). During that period, MIUI is like a data collection honeypot, sucking up your PII and serving you ads.

It might be ‘normal’ now to Xiaomi customers to wait to be able to unlock the phones that they have paid for and own (perhaps in the same sense someone in an abusive relationship might consider getting hit ‘normal’ because it has been happening for a while), but the idea that the company who sold you the phone gets some say on when you get the ‘privilege’ of running what you like on it, and make you jump through frustrating hoops to control your own device, is certainly not okay.

If they just wanted to stop reselling phones with non-Xiaomi sanctioned malware / bloatware added, making the bootloader make it clear it is unlocked (as Google does, for example) would be enough. Or they could make a different brand for phones that are unlocked, using the same hardware except with a different logo, and let people choose if they want unlocked or walled garden.

However, they make money off selling targeted ads based on information they collect - so I’m sure that they probably don’t want to do any of those things if they don’t have to, because they might disrupt their surveillance capitalism.


Xiaomi phones used to be good for custom ROMs, but now they try to stop you unlocking the bootloader by making you wait an unreasonable amount of time after first registering the device with them before you can unlock. Many of the other vendors are even worse.

So from that perspective, Pixel devices are not a terrible choice if you are going to flash a non-stock image.


The proposal doesn’t say what the interface between the browser and the OS / hardware is. They mention (but don’t elaborate on) modified browsers. Google’s track record includes:

  1. Creating SafetyNet software and the Play Integrity API that create ‘attestations’ that the device is running manufacturer supplied software. They can pass for now (at a lower ‘integrity level’) with software like LineageOS combined with software like Magisk (Magisk by itself used to be enough, but then Google hired the Magisk developer and soon after that was dropped) and Universal SafetyNet Fix, but those work by making the device pretend to be an earlier device that doesn’t have ARM TrustZone configured, and one day the net is going to close - so these actively take control away from users over what OS they can run on their phone if they want to use Google and third party services (Google Pay, many apps).
  2. Requiring Android Apps be signed, and creating a separate tier of ‘trusted’ Android apps needed to create a browser. For example, to implement WebAuthn with hardware support (as Chrome does) on Android, you need to call com.google.android.gms.fido.fido2.Fido2PrivilegedApiClient, and Google doesn’t even provide a way to apply to get allowlisted for (Mozilla and Google are, for example, allowed to build software that uses that API but want to run your own modified browser and call that API on hardware you own? Good luck convincing Google to add you to the allowlist).
  3. Locking down extension APIs in Chrome to make it unsuitable for things they don’t like, like Adblocking, as in: https://www.xda-developers.com/google-chrome-manifest-v3-ad-blocker-extension-api/.

So if Google can make it so you can’t run your own OS, and their OS won’t let you run your own browser (and BTW Microsoft and Apple are on a similar journey), and their browser won’t let you run an adblocker, where does that leave us?

It creates a ratchet effect where Google, Apple, and Microsoft can compete with each other, and the Internet is usable from their browsers running unmodified systems sold by them or their favoured vendors, but any other option becomes impractical as a daily driver, and they can effectively stack things against there ever being a new operating system / distro to compete with them, by making their web properties unusable and promoting that as the standard. This is a massive distortion of the open web from where it is now.

A regulation that if hardware has private or secret keys embedded into it, hardware manufacturers must provide the end user with those keys; and that if they have unchangeable public keys embedded and require that software be signed with that to boot or access some hardware, manufacturers must provide the private keys to end users. If that was the law in a few states that are big enough that manufacturers won’t just ignore them, it would shut down this sort of scheme.


It is a bit concerning if Mozilla Corporation (which is ultimately supposed to serve the goals of its shareholder Mozilla Foundation) are trying to develop things which are not exploitable security bugs behind closed doors. The reason for Bugzilla supporting confidential bugs is so 0-days aren’t available for anyone to browse, and that justification doesn’t seem to exist in this case.