You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I like what you guys are doing but your ranking algorithm is bad. It's basically unusable and I think that's because you're not incorporating domain popularity in ranking score.
I've been looking into finding search engine for AI/LLM use but unfortunately search providers are very expensive and significantly restrictive with their data use. It's come to the point where I'm thinking of building my own search provider.
Crawling and indexing is not an issue. It's basically solved problem. But ranking is where the real deal hides.
The way ranking should be implement is ranking pages which user is most likely interested in first.
Major search providers have nailed this quite well but because their data is private and terms forbid using it that leaves other search providers at significant disadvantage having to start from scratch. This means search is centralized and basically monopoly among major search providers as starting new search provider requires huge resources.
I think this problem could be solved by democratizing, crowdsourcing and open sourcing data to be used to create domain popularity index.
Unfortunately this still is difficult problem because such data can be manipulated but atleast it should improve search results.
One such data would be domain likes/dislikes/blocks and search result URL clicks and ignores. By having multiple search providers open sourcing this data it would improve search results as shared dataset would provider better coverage.
Another data point could be if website providers would open source their analytics.
Another way how I thought of building this would be mapping domain names using whois to company names to company yearly revenue. This would make it harder to cheat and would penalize copycat websites. Unfortunately this would also penalize non-profits so might need another data source for those.
Anyway my point is for multiple search engines to work together on sharing and building such dataset to improve result ranking. This would also allow researchers to experiment with new algorithms and approaches as I don't know of any such dataset. There are analytics companies that collect a lot of data but it's always private and being sold.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I like what you guys are doing but your ranking algorithm is bad. It's basically unusable and I think that's because you're not incorporating domain popularity in ranking score.
I've been looking into finding search engine for AI/LLM use but unfortunately search providers are very expensive and significantly restrictive with their data use. It's come to the point where I'm thinking of building my own search provider.
Crawling and indexing is not an issue. It's basically solved problem. But ranking is where the real deal hides.
The way ranking should be implement is ranking pages which user is most likely interested in first.
Major search providers have nailed this quite well but because their data is private and terms forbid using it that leaves other search providers at significant disadvantage having to start from scratch. This means search is centralized and basically monopoly among major search providers as starting new search provider requires huge resources.
I think this problem could be solved by democratizing, crowdsourcing and open sourcing data to be used to create domain popularity index.
Unfortunately this still is difficult problem because such data can be manipulated but atleast it should improve search results.
One such data would be domain likes/dislikes/blocks and search result URL clicks and ignores. By having multiple search providers open sourcing this data it would improve search results as shared dataset would provider better coverage.
Another data point could be if website providers would open source their analytics.
Another way how I thought of building this would be mapping domain names using whois to company names to company yearly revenue. This would make it harder to cheat and would penalize copycat websites. Unfortunately this would also penalize non-profits so might need another data source for those.
Anyway my point is for multiple search engines to work together on sharing and building such dataset to improve result ranking. This would also allow researchers to experiment with new algorithms and approaches as I don't know of any such dataset. There are analytics companies that collect a lot of data but it's always private and being sold.
Beta Was this translation helpful? Give feedback.
All reactions