• 4 Posts
  • 7 Comments
Joined 3M ago
cake
Cake day: Jan 29, 2025

help-circle
rss
cross-posted from: https://scribe.disroot.org/post/1835374 > DeepSeek-R1 is a blockbuster open-source model that is now at the top of the U.S. App Store. > > As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence. > > Today we’re publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping. > > ... > >
fedilink

That’s a real issue:

Censorship and isolation as China bans thousands of mobile apps

… ironically, TikTok, along with other globally popular social media platforms, is also unavailable in China. Conversely, ByteDance tailor-made a local version, Douyin, for Chinese users, to comply with the country’s stringent censorship rules. In fact, TikTok is not an isolated case. Alibaba’s popular messaging platform, Ding Talk, is also unavailable in China, and its local version is called Ding Ding.



we don’t see similar articles for OpenAI and other US-based AI tools.

I don’t know what kind of media you consume, but I read such articles all the time. And as I said already here, there is still a difference as surveillance and censorship is much harsher in China than anywhere else.

(It’s amazing. I’m really new on Lemmy, but it seems whataboutery is a thing here …)


There would be a lot of reasons to differentiate between democracies and autocracies, but I agree that it’s not surprising. This is just the next step of a totally over-hyped technology imo. Here everyone gets excited about a performance while no one even knows what the training data is, but people are excited by these PR announcements.


cross-posted from: https://scribe.disroot.org/post/1834950 > Privacy activists are warning about the invasive nature of DeepSeek, which collects a trove of personal user information that could be handed over to the Chinese government > > People, however, just don’t care. > > Luke de Pulford, co-founder of the Inter-Parliamentary Alliance on China (IPAC), shared screenshots from the Chinese AI chatbot’s privacy policy, which stated data it collects is stored in “secure servers located in the People’s Republic of China.” > > ... > > “Just fyi, @deepseek_ai collects your IP, keystroke patterns, device info, etc etc, and stores it in China, where all that data is vulnerable to arbitrary requisition from the [Chinese] State,” said de Pulford, leader of IPAC, a global group of lawmakers who seek to hold China accountable for democratic abuses. > > “Anticipating tedious whataboutery: the difference between this and free-world social media apps is that you can enforce your data rights in rule of law countries. This is not the case in China,” said de Pulford.
fedilink

Chinese app DeepSeek blocked on Apple and Google app stores in Italy over data privacy concerns
cross-posted from: https://scribe.disroot.org/post/1834743 > The Italian regulator, known as the Garante, said on Tuesday it wanted to know what personal data is collected, from which sources, for what purposes, on what legal basis and whether it is stored in China. It gave DeepSeek and its affiliated companies 20 days to respond.
fedilink

The guys at HF (and many others) appear to have a different understanding of Open Source.

As the Open Source AI definition says, among others:

Data Information: Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system. Data Information shall be made available under OSI-approved terms.

  • In particular, this must include: (1) the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies; (2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.

Code: The complete source code used to train and run the system. The Code shall represent the full specification of how the data was processed and filtered, and how the training was done. Code shall be made available under OSI-approved licenses.

  • For example, if used, this must include code used for processing and filtering data, code used for training including arguments and settings used, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.

Parameters: The model parameters, such as weights or other configuration settings. Parameters shall be made available under OSI-approved terms.

  • The licensing or other terms applied to these elements and to any combination thereof may contain conditions that require any modified version to be released under the same terms as the original.

These three components -data, code, parameter- shall be released under the same condition.


Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.


I feel safer knowing that my data is not in a country where the company can use it against me

Where is this country that can’t use your data against you?


Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.