WebCrawlerX - A Web Crawler, Indexer & Search Engine
Web App•Completed
Project Details
Overview
WebCrawlerX is a custom-built search engine platform that handles discovering, indexing, and searching web pages using explicit data structures. It bypasses external search libraries to implement the core mechanics of indexing and retrieval from the ground up.
The Problem
Building a functional web discovery and indexing system that can efficiently traverse the web and provide ranked search results without relying on production-grade search libraries like Elasticsearch or Solr.
Approach
- —BFS Traversal: Implemented a robust Breadth-First Search crawler with configurable depth control to discover web pages safely.
- —Trie Data Structure: Built a custom Trie (Prefix Tree) logic for highly efficient keyword matching and autocomplete functionality.
- —Inverted Indexing: Created a manual inverted index system to map keywords directly to their source URLs for near-instant lookup.
- —Ranking Math: Developed a frequency-based ranking algorithm using Merge Sort to order search results by relevance.
Technical Stack
| Layer | Technology |
|---|---|
| Backend | Django 5.x, Python 3.12 |
| Scraping | BeautifulSoup4, Requests API |
| Database | SQLite (Django ORM) |
| Structures | Custom Trie, Inverted Index, Adjacency List |
| Frontend | HTML5, CSS3, Vanilla JavaScript |
What I learned
- —Data Structures: Custom Trie implementation, Graph (adjacency list), and Queue for BFS processing
- —Algorithms: Breadth-First Search (BFS), Merge Sort, and manual Inverted Index creation
- —Web Scraping: Deep parsing of HTML with BeautifulSoup and robust HTTP request handling
- —Full-Stack: Bridging complex algorithmic logic with a Django-based web interface
- —Database Modeling: Efficient relational mapping for crawled pages and their indexed metadata



