• 0 Posts
  • 6 Comments
Joined 1Y ago
cake
Cake day: Jun 12, 2023

help-circle
rss

Android is sending a ton of data, though, even if you’re not doing anything internet related. It, also, kinda reacts to “okay, google”, which wouldn’t really be possible if it wasn’t listening.

Now, it obviously doesn’t keep a continuous, lossless audio stream from the phone to some google server. But, it could be sending text parsed from audio locally, or just snippets of audio when the thing detects speech. Relatively normal stuff to collect for analytics purposes, actually.

Now, data like that could “easily” get “misplaced”, of course, and end up in the ad-shoveling machine… Not necessary at Google’s hands: could be any app, really. Facebook, TickTok, random free to play Candy Crush clone, etc. But if that data gets into the interwoven clusterfuck of advertisement might, it will likely end up having an effect on the ads shown to the user.


I’ve noticed that too. Intentionally veered a conversation into a different topic and, lo and behold, I get “relevant recommendation” short time later. That was, not entirely coincidentally, the same day I unlocked the bootloader and flashed a de-googled ROM.


Not once did I claim that LLMs are sapient, sentient or even have any kind of personality. I didn’t even use the overused term “AI”.

LLMs, for example, are something like… a calculator. But for text.

A calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly.

When we want to create a solver for systems that aren’t as easily defined, we have to resort to other methods. E.g. “machine learning”.

Basically, instead of designing all the logic entirely by hand, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for something a human mind can’t even break up into the building blocks, due to the shear complexity of the given system (such as a natural language).

And like a calculator that can derive that 2 + 3 is 5, despite the fact that number 5 is never mentioned in the input, or that particular formula was not a part of the suit of tests that were used to verify that the calculator works correctly, a machine learning model can figure out that “apple slices + batter = apple pie”, assuming it has been tuned (aka trained) right.


Not once did I claim that LLMs are sapient, sentient or even have any kind of personality. I didn’t even use the overused term “AI”.

LLMs, for example, are something like… a calculator. But for text.

A calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly.

When we want to create a solver for systems that aren’t as easily defined, we have to resort to other methods. E.g. “machine learning”.

Basically, instead of designing all the logic entirely by hand, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for something a human mind can’t even break up into the building blocks, due to the shear complexity of the given system (such as a natural language).

And like a calculator that can derive that 2 + 3 is 5, despite the fact that number 5 is never mentioned in the input, or that particular formula was not a part of the suit of tests that were used to verify that the calculator works correctly, a machine learning model can figure out that “apple slices + batter = apple pie”, assuming it has been tuned (aka trained) right.


Learning is, essentially, “algorithmically copy-paste”. The vast majority of things you know, you’ve learned from other people or other people’s works. What makes you more than a copy-pasting machine is the ability to extrapolate from that acquired knowledge to create new knowledge.

And currently existing models can often do the same! Sometimes they make pretty stupid mistakes, but they often do, in fact, manage to end up with brand new information derived from old stuff.

I’ve tortured various LLMs with short stories, questions and riddles, which I’ve written specifically for the task and which I’ve asked the models to explain or rewrite. Surprisingly, they often get things either mostly or absolutely right, despite the fact it’s novel data they’ve never seen before. So, there’s definitely some actual learning going on. Or, at least, something incredibly close to it, to the point it’s nigh impossible to differentiate it from actual learning.


It’s illegal if you copy-paste someone’s work verbatim. It’s not illegal to, for example, summarize someone’s work and write a short version of it.

As long as overfitting doesn’t happen and the machine learning model actually learns general patterns, instead of memorizing training data, it should be perfectly capable of generating data that’s not copied verbatim from humans. Whom, exactly, a model is plagiarizing if it generates a summarized version of some work you give it, particularly if that work is novel and was created or published after the model was trained?