The issue with Google’s personalised search results is, imo:

  1. Not only is it not opt-in, but you can’t even opt out of it. Personalised search results should be opt-in and disabled by default.
  2. The data kept on you is used to sell you ads
  3. The data kept on you will be handed over to state entities fairly easily

Given those three problems, how feasible would it be to self-host a search engine that personalises your results to show you things that are more relevant to you? Avoiding issues 1 & 2 as you’re self-hosting so presumably you have made the decisions around those two things. And issue 3 is improved as you can host it off-shore if you are concerned about your domestic state, and if you are legally compelled to hand over data, you can make the personal choice about whether or not to take the hit of the consequences of refusing, rather than with a big company who will obviously immediately comply and not attempt to fight it even on legal grounds.

A basic use-case example is, say you’re a programmer and you look up ruby, you would want to get the first result as the programming language’s website rather than the wikipedia page for the gemstone. You could just make the search query ruby programming language on any privacy-respecting search engine, but it’s just a bit of QoL improvement to not have to think about the different ways an ambiguous search query like that could be interpreted.

I know I’m not exactly hitting the mark, have you looked at kagi? You can personalize the weighting of results from certain sites. You can also add lenses which will let you drive results to forums, programming, academia, etc.

To me it was a bit like reliving the early days of google with the don’t be evil mantra still in tact.

Let me also say, it appears to be privacy respecting.

It has been good for me so far. If someone sees a reason I should run away from this, please let me know why and what we all should use instead, I’d appreciate it!

@communism@lemmy.ml
creator
link
fedilink
11h

Kagi’s an interesting one. The main reason why I don’t go with it is because you’d have to have an account, de-anonymising you. I know they have their “privacy pass” feature but that seems to essentially rely on trust that they aren’t tying your private searches to an account. And also $10/month for a search engine is just pretty steep for my budget.

Self hosting search engines is very hard. The scraping, indexing and storage requirements are immense. You could definitely self-host a front end (with your QoL improvements), but the back end search engines (Bing/Google/etc) will be able to track you all the same.

@communism@lemmy.ml
creator
link
fedilink
32d

That’s a good point, I forgot that stuff like SearXNG are only frontends so in order to add personalisation to them you’d have to modify your queries to Bing/Google/etc I assume, rather than do what Google etc do with whatever algorithm they use for providing search results.

Are there even open source indexing software available?

Max-P
link
fedilink
62d

There’s YaCy. I’ve run a node for a while but it ended up filling up my server’s drive just indexing german wikipedia and the results were terrible.

And it’s still not private because you have to broadcast the query across the network.

None that im aware of. There are webscrapers, and I guess you could just webscrape and dump the results into a postgres db and use it to index. But I’m guessing you’ll eventually want something more tuned/custom? But even if it existed, there is the discovery problem. How do you find the sites to scrape? Bing and google both let site operators submit urls, but that isn’t gonna scale to self-hosting.

Yeah, exactly. I just think that making anonymous request from google or bing is private enough for me.

Echedelle (she/her)
link
fedilink
3
edit-2
2d

Stract, Marginalia, Wiby, Mwmbl, etc

The two first are NLnet funded and the second one is one of the best developed despite it uses Java in contrast to Rust. I see the developer taking the development very seriously.

Self-hosting a search engine is unfortunately not feasible given the amount of data and power required for it. Not to mention access to the data (crawling yourself or using another index).

For privacy and customization there is Kagi, which is amazing and very customizable, but requires a paid subscription. You are a customer rather than the product, though.

Create a post

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

  • Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
  • Don’t promote proprietary software
  • Try to keep things on topic
  • If you have a question, please try searching for previous discussions, maybe it has already been answered
  • Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
  • Be nice :)

Related communities

much thanks to @gary_host_laptop for the logo design :)

  • 0 users online
  • 124 users / day
  • 1.05K users / week
  • 1.3K users / month
  • 4.58K users / 6 months
  • 1 subscriber
  • 3.92K Posts
  • 98.7K Comments
  • Modlog