Skip to content

Al-kides/Orion-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Orion Search Engine
Explore web pages with less bloat


image

🚀 What is this?
A retro-futuristic web crawler and search engine that feels like exploring a star map. Built with Python, SQLite, and Flask, featuring:

  • Web spider that crawls 10,000 pages (modify max_pages in web_spider.py)
  • Cosmic-themed UI with animated constellations and cyberpunk aesthetics
  • PageRank algorithm powering search results
  • Self-contained database (no cloud services required)

Perfect for researchers, hobbyists, or anyone who wants to understand web crawling fundamentals without corporate tracking. Add your own websites to the seed_urls to expand and decrease search.


🌌 Installation
You’ll need:

  • Python 3.10+
  • Basic terminal skills (i.e python source.py / python3 source.py)
  1. Clone this repo:
    image image

    git clone https://github.com/Euclidae/Orion-Search-Engine.git
    cd Orion-Search-Engine
  2. Install requirements:
    image

    #requests should come in preinstalled but I have been told it sometimes doesn't.
    pip install requests beautifulsoup4 flask networkx
  3. Start crawling (pick 1-2 seed URLs initially):

    #run this first before you run web_search
    python web_spider.py
  4. Calculate PageRank (run after crawling):

    python pagerank.py
  5. Launch the search portal:

    # P.S I am apolitical so do not make assumptions based on my choice BBC. It was among the first to come to mind.
    python web_search.py

Visit http://localhost:5000 on your web browser to begin exploring.


🔭 Key Components

  1. web_spider.py

    • Starts from seed URLs (BBC, CNN, CS50 Manual by default)
    • Stores raw HTML and cleaned text in crawled_pages.db
    • Avoids duplicate visits using URL frontier
  2. web_search.py

    • Flask server with cyberpunk-themed templates
    • Searches cleaned content using SQL LIKE queries
    • Ranks results using pre-computed PageRank
  3. pagerank.py

    • Builds link graph using NetworkX
    • Updates PageRank scores in database
  4. static/

    • Dark forest background image (Forest-Dark.png)
    • CSS animations for constellation effects
    • Retro terminal-style fonts

🌠 Customization
web_spider.py image

  • Add seed URLs: Edit seed_urls in web_spider.py
  • Change visual theme: Modify style.css (try neon colors!)
  • Improve ranking: Adjust the PageRank damping factor in pagerank.py
  • Add stopwords: Cleaner content = better search results

🌍 Ethics & License

  • Use freely: MIT License - modify, share, or build commercial projects
  • Respect robots.txt: Add parsing logic if crawling public sites
  • Storage warning: 10k pages ≈ 500MB local storage

📡 Troubleshooting

"Table pages has no column"
Delete crawled_pages.db and re-run the spider - schema updates require fresh DB.

Slow crawling
Lower max_pages or add timeouts between requests.

Missing constellation effects
Ensure Chrome/Firefox and hardware acceleration enabled.


🌌 Why "Orion"?
Named after the grand archer, Super Orion. I'd name it Gilgamesh, but that just sounds stupid, not gonna lie https://typemoon.fandom.com/wiki/Super_Orion https://typemoon.fandom.com/wiki/Gilgamesh

🌕 Final Note
This isn’t Google - it’s a learning tool that prioritizes transparency over speed. Expect quirks, enjoy the retro vibe, and maybe add your own constellation patterns to the CSS.

May your searches always find starlight.


Euclidae | 2024 | GitHub
Built with Python, insomnia, and a fascination with space-age retro

P.S. I borrowed the background from Jakoolit's Hyprland wallpapers. Bye.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published