Lumar (formerly Deepcrawl) is a cloud-based web crawling tool that identifies technical SEO issues and deficiencies across websites. Unlike desktop applications that consume local hardware resources, Deepcrawl operates entirely on remote servers — simply log in at deepcrawl.com and run crawls without slowing down your computer.
How to Start a Crawl
Step 1: Domain Configuration
Click "New Project," enter your domain name and project name, optionally activate JavaScript rendering if your site uses client-side content, then click "Save and Continue."
Step 2: Sources Configuration
The Sources screen offers seven configuration options for comprehensive crawl coverage:
- Website: Enable subdomain and HTTPS/HTTP page crawling
- Sitemaps: Deepcrawl auto-detects sitemaps; upload custom ones if needed
- Backlinks: Integrate Majestic backlink data or upload manual backlink lists for richer reports
- Google Search Console: Connect to include unlinked or inaccessible pages in crawls
- Analytics: Connect Analytics to discover unlinked pages and add Analytics data to reports
- Log Summary: Import log file analysis from tools like Splunk or Logz.io, or upload logs manually
- URL Lists: Upload custom URL lists for targeted, specific-page crawling
Step 3: Limits Configuration
Set your crawl rate (URLs per second), define the maximum total URLs to crawl, and configure email notifications. Limiting crawl rate prevents server overload during the crawl.
Step 4: Advanced Settings
Select additional domains and subdomains, define URL inclusion and exclusion paths, choose a user agent for crawling, then click "Start Crawl." Monitor real-time URL counts during crawl progress and receive an email notification upon completion.
Dashboard Overview
The dashboard presents critical sections for immediate site health assessment:
- Primary Issues: Broken links, unlinked pages, and 4xx errors with clickable details for each
- Statistics: Primary pages, duplicate pages, 200/non-200 status breakdown, and indexability data
- Changes: Comparison with previous crawl results for tracking improvements over time
- Status Code Sections: 200 (healthy) pages, non-200 pages requiring attention, and uncrawled URLs
- Indexability: Non-indexable pages with reasons, and orphan pages needing internal linking
- Trend Analysis: Historical graphs tracking duplicate, non-200, and non-indexable pages
Page Categories
Indexable vs. Non-Indexable Pages
Indexable Pages shows all pages eligible for indexing with a unique/duplicate breakdown. Non-Indexable Pages shows pages blocked from indexing with specific reasons — canonical tag pointing elsewhere, noindex tag, redirected pages, and more.
200 vs. Non-200 Pages
200 Pages are properly served and available for indexing. Non-200 Pages require attention — review and resolve 404 and 500 errors. Uncrawled URLs are pages blocked from crawling, with justification shown (e.g., robots.txt disallow commands).
Primary and Duplicate Pages
Primary Pages are indexable, unique pages representing your core site content. Duplicate Pages are those sharing titles, descriptions, or substantially similar content — title and meta description optimization is recommended to differentiate them.
Content Analysis Sections
Content issues are among the most common technical SEO problems. Deepcrawl identifies them systematically:
- Missing Titles: Pages without title tags — add unique, keyword-rich titles immediately
- Short Titles: Title tags below optimal length — expand to include target keywords
- Max Title Length: Oversized title tags appearing truncated in search results — shorten appropriately
- Duplicate Titles: Identical title tags across pages — create unique titles for all indexed pages
- Missing Descriptions: Missing meta description tags — add to improve CTR in search results
- Duplicate Descriptions: Identical meta descriptions — create unique descriptions for key pages
Content Quality Sections
- Empty Pages: Blank pages — redirect to relevant content or add substantive content
- Thin Pages: Minimal content pages — develop content to enhance user and bot experience
- Missing H1 Tags: Pages without H1 elements — add unique H1s to all indexed pages
- Multiple H1 Tag Pages: Pages with multiple H1s — consolidate to a single H1 per page
Technical Issue Sections
- Canonical to Non-200: Canonical tags referencing pages that don't return 200 — update URLs or canonical tags
- Redirect Chains: Multiple sequential redirects consuming crawl budget — consolidate to direct redirects
- HTTP Pages: Insecure pages — migrate all to HTTPS to avoid security warnings and ranking suppression
- Broken Sitemap Links: Sitemap entries not returning 200 status — update or remove stale entries
- Non-Indexable URLs in Sitemaps: Remove non-indexable pages from sitemaps unless intentional
- All Broken Redirects: Redirect destinations returning errors — fix or cancel problematic redirects
Search Performance Integration
When Google Search Console is connected, Deepcrawl identifies Indexable Pages Without Search Impressions — indexed pages receiving zero Search Console impressions. These pages either need SEO optimization or should be evaluated for crawl budget management.
Regular Deepcrawl crawls with systematic problem resolution in priority order maintains website health and search visibility. While some issues are minor technical problems, others — like 500 errors, redirect chains, and missing canonical tags — significantly impact SEO performance and require immediate attention.


