Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.
Spent many years on Reddit and then some time on kbin.social.
Not necessarily. Curation can also be done by AIs, at least in part.
As a concrete example, NVIDIA’s Nemotron-4 is a system specifically intended for generating “synthetic” training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they’re good to train on.
Humans can still be in that loop, but they don’t necessarily have to be. And the AI can help them in that role so that it’s not necessarily a huge task.
The term “model collapse” gets brought up frequently to describe this, but it’s commonly very misunderstood. There actually isn’t a fundamental problem with training an AI on data that includes other AI outputs, as long as the training data is well curated to maintain its quality. That needs to be done with non-AI-generated training data already anyway so it’s not really extra effort. The research paper that popularized the term “model collapse” used an unrealistically simplistic approach, it just recycled all of an AI’s output into the training set for subsequent generations of AI without any quality control or additional training data mixed in.
Apparently, I am. People actually want this
Thank you for recognizing this. It gets quite frustrating in threads like these about new AI tools being deployed when people declare “nobody wants this!” And I try to explain that there are actually people that do want it. I find many AI tools to be quite handy.
I tend to get vigorously downvoted at that point, as if that would make the demand “go away” somehow. But sticking heads in sand doesn’t accomplish anything except to make people increasingly out of touch.
I don’t know what you’re suggesting is going on here, those images you linked don’t work as far as I can tell. Firefox throws a security certificate error. Are you hosting them yourself and collecting IP addresses from people who click on them? If so, that’s not exactly a Lemmy-specific flaw. That’s just basic Internet 101.
Here’s DAI’s peg over time. Over the past year it’s had a high point of $1.0012 and a low of $0.9979, neither extreme lasting more than a brief spike. Seems like a pretty good peg to me. The mechanism by which it maintains its peg is complex, but fully transparent since it happens entirely on-chain.
Here’s LUSD, another similarly algorithmically-pegged stabletoken. It’s smaller than DAI so it’s a bit less stable, it had one spike this year where it went all the way up to $1.029. But the mechanism is much simpler so if you’re having trouble understanding DAI it might be an easier place to start.
All you’re saying here is “nuh-uh! I don’t believe you!” Which isn’t particularly useful.
I could dig up the addresses of MakerDAO or Liquity vaults, you could examine them directly using Etherscan and see which tokens back them. But I somehow get the impression that that would be a waste of my time. Is there literally anything that could convince you, before I go running around doing any further work trying?
They call it proof of stake, but it’s proof of ownership. It’s proving you own coins. That’s it. Edit: I think you thought I was talking about proof of authority?
No, there is a distinction here, and it’s a very important one.
If you’re using proof of ownership then there’s no way of penalizing the owners who are validating the chain if they misbehave. That’s somewhat more like what Bitcoin uses, actually - proof of ownership of mining rigs, in a sense. If a Bitcoin miner 51% attacks the chain then after the attack is done they still have their mining rigs and can continue to attempt to attack it if they want.
With proof of stake, the resource in question - the tokens, in Ethereum’s case - are put up as a stake. Ie, they are placed under the control of the blockchain’s validation system, so if the validator tries pulling some kind of funny business their stake can be slashed. Someone who attacks Ethereum has to burn their stake in the process, which would cost them tens of billions of dollars and prevent them from attempting future attacks.
You can own millions of Ether and that’s meaningless as far as validation goes. It’s only once you put them up as a stake do you get “skin in the game.”
You’re right, it was not designed to support an idea that didn’t exist when it was designed. But upgrades to improve lightning have been proposed and made it into protocol
You were earlier touting Bitcoin’s lack of protocol upgrades as a key feature. Now it’s performing upgrades?
The problem with Bitcoin’s upgrades is that they’ve made “no hard forks” into a religious tenant, so whenever they try to do anything new they have to squish it in as a soft fork somehow built on top of the existing foundations. The existing foundations aren’t well suited to this kind of thing, though, since they were designed 15 years ago. So it makes for some very labored and inefficient design, like in the case with Lightning.
Layer 2s on something like Ethereum, which was designed from the ground up to support them and which continues to add new features making them more efficient and feature-rich, are far easier and cheaper to work with.
I don’t know about Eth’s long-term future as a decentralized platform when centralization continues to increase and a conspiracy, hack, or government pressure on Hetzner and Amazon could impact over half the nodes on the network.
It’s important to call out that nodes in general are not important for validating the chain, it doesn’t matter who’s controlling them. You can run your own node and there’s nothing those other non-validating nodes can do to tamper with your view of the network, the worst they could do is stop sending you updates (which would be obvious and you could then go hunting for replacement feeds).
Bitcoin’s protocol has not meaningfully changed in 15 years.
Well, yes, exactly. That’s the problem. There have been innumerable innovations and improvements in the field over those 15 years, but Bitcoin ossified early and so it’s got none of them.
Ethereum is centralized AF. The majority of the supply was sold during the pre-mine, and now that “proof of ownership” runs the network, the risk of a 51% attack is significant.
You’ve got a very inaccurate and skewed view of this. Most significantly, it’s not “proof of ownership,” it’s “proof of stake.” Proof of ownership and proof of stake are distinct technologies that operate in different manners. Ethereum is not proof of ownership.
You’re clearly not very familiar with how Ethereum’s proof of stake system operates because “51% attack” is not meaningful. There’s nothing magical about the 51% threshold in Ethereum’s system of staking. There is a magical threshold at 66%, if you’ve got more than that you can prevent “finality” from happening which will in turn cause some disruption to the chain. But most significantly, it doesn’t prevent blocks from continuing to be processed and doesn’t allow stakers to forge blocks. It’s a highly theoretical attack since no stakers or staking pools are anywhere remotely close to that sort of dominance, and even if they did do that there’d still be mechanisms by which they could be slashed.
Now that Bitcoin lightning is out and mature, transaction speed and chain capacity is no longer the limiting factor.
Lightning has been an entirely predictable disappointment. The problem is that Bitcoin was not designed to support something like Lightning, and that very feature you touted above - Bitcoin’s complete ossification of protocol upgrades 15 years ago - means it can’t be made to support it. Lightning’s total capacity is $300 million. Ironically there’s thirty times more Bitcoin being transacted on the Ethereum network in the form of WBTC than there is Bitcoin being transacted in Lightning.
If you’re interested in layer-2 solutions then Ethereum’s recent updates have been all about providing better support for that kind of thing, using many cryptographic advances that came along in those 15 years. Some of them incorporate Monero-like privacy systems, even, such as Arbitrum.
Some stablecoins are centralized, but it’s not a fundamental requirement of how they operate. Stabletokens such as DAI or Liquity are run without a central company. They cannot “rug” you because they’re based on smart contracts.
They are often poorly regulated or unregulated entirely
Isn’t that kind of the point?
so you have no reason to trust their claims
Smart contract code can be audited by anyone and trusted to run exactly as it’s written.
They are, at best, pegging their value to a currency which is designed to lose 2-3% of its value per year due to inflation
Stablecoins aren’t required to peg to any specific measure of value (I assume you’re referring to US dollars?). There are stabletokens pegged to gold, for example, if you really want something like that.
Since US dollars work just fine for commerce, though, using a stabletoken that’s pegged to US dollars works fine for commerce too.
There’s an entire category of cryptocurrency designed specifically for the use case you’re asking for, the stablecoins. They are pegged to reference values using a variety of techniques. US Dollars are a common denomination, since it’s already frequently seen as a global reserve currency, but if you really want there are stablecoins pegged to other things as well.
If you want a more specific example, I typically use DAI as a go-to example since it doesn’t depend on third party trust like some of the more commonly-used ones (such as Tether).
Ah yes, must keep that war on drugs going, it’s totally worth sacrificing everyones’ privacy to make sure the Devil’s Cabbage is kept off the streets. Reefer Madness is epidemic.
And human trafficking, yes, we can’t have people sending remittances to their families in destitute foreign countries so that they might be able to afford to immigrate too. So many poor foreigners trying to get in!
Or maybe this is actually too complicated an issue to dismiss with a simple “if people have done nothing wrong they have nothing to hide?”
Even if you trained the AI yourself from scratch you still can’t be confident you know what the AI is going to say under any given circumstance. LLMs have an inherent unpredictability to them. That’s part of their purpose, they’re not databases or search engines.
if I were to download a pre-trained model from what I thought was a reputable source, but was man-in-the middled and provided with a maliciously trained model
This is a risk for anything you download off the Internet, even source code could be MITMed to give you something with malicious stuff embedded in it. And no, I don’t believe you’d read and comprehend every line of it before you compile and run it. You need to verify checksums
As I said above, the real security comes from the code that’s running the LLM model. If someone wanted to “listen in” on what you say to the AI, they’d need to compromise that code to have it send your inputs to them. The model itself can’t do that. If someone wanted to have the model delete data or mess with your machine, it would be the execution framework of the model that’s doing that, not the model itself. And so forth.
You can probably come up with edge cases that are more difficult to secure, such as a troubleshooting AI whose literal purpose is messing with your system’s settings and whatnot, but that’s why I said “99% of the way there” in my original comment. There’s always edge cases.
Ironically, as far as I’m aware it’s based off of research done by some AI decelerationists over on the alignment forum who wanted to show how “unsafe” open models were in the hopes that there’d be regulation imposed to prevent companies from distributing them. They demonstrated that the “refusals” trained into LLMs could be removed with this method, allowing it to answer questions they considered scary.
The open LLM community responded by going “coooool!” And adapting the technique as a general tool for “training” models in various other ways.
That would be part of what’s required for them to be “open-weight”.
A plain old binary LLM model is somewhat equivalent to compiled object code, so redistributability is the main thing you can “open” about it compared to a “closed” model.
An LLM model is more malleable than compiled object code, though, as I described above there’s various ways you can mutate an LLM model without needing its “source code.” So it’s not exactly equivalent to compiled object code.
Fortunately, LLMs don’t really need to be fully open source to get almost all of the benefits of open source. From a safety and security perspective it’s fine because the model weights don’t really do anything; all of the actual work is done by the framework code that’s running them, and if you can trust that due to it being open source you’re 99% of the way there. The LLM model just sits there transforming the input text into the output text.
From a customization standpoint it’s a little worse, but we’re coming up with a lot of neat tricks for retraining and fine-tuning model weights in powerful ways. The most recent bit development I’ve heard of is abliteration, a technique that lets you isolate a particular “feature” of an LLM and either enhance it or remove it. The first big use of it is to modify various “censored” LLMs to remove their ability to refuse to comply with instructions, so that all those “safe” and “responsible” AIs like Goody-2 can turned into something that’s actually useful. A more fun example is MopeyMule, a LLaMA3 model that has had all of his hope and joy abliterated.
So I’m willing to accept open-weight models as being “nearly as good” as a full-blown open source model. I’d like to see full-blown open source models develop more, sure, but I’m not terribly concerned about having to rely on an open-weight model to make an AI system work for the immediate term.
The thing that drives me nuts is that I really do value that baby they’re carrying around. It is precious. But I don’t want to give the Internet Archive money just to funnel into the pockets of their lawyers and settlement payments to big publishers due to these unrelated quixotic battles.
I was hoping that the IA would have learned a lesson from losing this court case, they should have settled as soon as they could. I’m sure the publishers don’t want the bad publicity of “destroying” the Internet Archive, they just want them to stop blatantly violating their copyrights. But this appeal suggests that they haven’t learned that lesson yet.
In an ideal world there’d either be some kind of leadership shakeup at the IA to get rid of whoever was behind this stunt, or some kind of alternative IA-like organization appears to pick up the archive before the IA goes broke and its collection ends up being sold off to the highest bidder. Or simply destroyed.
They don’t need to do anything so drastic. They just need to stop doing things that blatantly provoke legal attacks like this. Their “Emergency Covid Library” was a foolish stunt that is endangering their primary objective of information preservation, they wouldn’t have been sued if they’d just kept on carrying on as they were before.
Except it’s not a threat to the future of all libraries, it’s a threat to the future of “libraries” that decide to completely ignore copyright and give out an unlimited number of copies of ebooks. Basically turning themselves into book-focused piracy sites.
I’m incredibly frustrated with Internet Archive for bringing this on themselves. It is not their mandate to fight copyright, that’s something better left in the hands of activist organizations like the EFF. The Internet Archive’s mandate is to archive the Internet, to store and preserve knowledge. Distributing it is secondary to that goal. And picking unnecessary fights with big publishing houses like this is directly contrary to that goal, since now the Internet Archive is in danger.
It’s like they’re carrying around a precious baby and they decided it was a good idea to start whacking a bear with a stick. Now the bear is eating their leg and they’re screaming “oh my god help me, the bear is threatening this baby!” Well yeah, but you shouldn’t have brought a baby with you when you went on a bear-whacking expedition. You should have known exactly what that bear was going to do.
This is indeed one of the things cryptocurrencies exist for, but social media denizens around these parts have long conditioned themselves to hate it.
So a rock and a hard place, it seems. Which is more hated; the big data-harvesting corporation co-founded by Elon Musk, or a big bad NFT-hosting blockchain?
For people who are concerned about data harvesting I would recommend something like Monero or Aztec over Bitcoin, though. Bitcoin’s basically obsolete at this point, coasting on name recognition and inertia, and has no built-in privacy features.
No, it’s opt-in. If you do nothing you won’t have it.