Lumar (Formerly Deepcrawl) SEO Tool Guide

Lumar (formerly Deepcrawl) is a cloud-based web crawling tool that identifies technical SEO issues and deficiencies across websites. Unlike desktop applications that consume local hardware resources, Deepcrawl operates entirely on remote servers — simply log in at deepcrawl.com and run crawls without slowing down your computer.

How to Start a Crawl

Step 1: Domain Configuration

Click "New Project," enter your domain name and project name, optionally activate JavaScript rendering if your site uses client-side content, then click "Save and Continue."

Step 2: Sources Configuration

The Sources screen offers seven configuration options for comprehensive crawl coverage:

Website: Enable subdomain and HTTPS/HTTP page crawling
Sitemaps: Deepcrawl auto-detects sitemaps; upload custom ones if needed
Backlinks: Integrate Majestic backlink data or upload manual backlink lists for richer reports
Google Search Console: Connect to include unlinked or inaccessible pages in crawls
Analytics: Connect Analytics to discover unlinked pages and add Analytics data to reports
Log Summary: Import log file analysis from tools like Splunk or Logz.io, or upload logs manually
URL Lists: Upload custom URL lists for targeted, specific-page crawling

Step 3: Limits Configuration

Set your crawl rate (URLs per second), define the maximum total URLs to crawl, and configure email notifications. Limiting crawl rate prevents server overload during the crawl.

Step 4: Advanced Settings

Select additional domains and subdomains, define URL inclusion and exclusion paths, choose a user agent for crawling, then click "Start Crawl." Monitor real-time URL counts during crawl progress and receive an email notification upon completion.

Dashboard Overview

The dashboard presents critical sections for immediate site health assessment:

Primary Issues: Broken links, unlinked pages, and 4xx errors with clickable details for each
Statistics: Primary pages, duplicate pages, 200/non-200 status breakdown, and indexability data
Changes: Comparison with previous crawl results for tracking improvements over time
Status Code Sections: 200 (healthy) pages, non-200 pages requiring attention, and uncrawled URLs
Indexability: Non-indexable pages with reasons, and orphan pages needing internal linking
Trend Analysis: Historical graphs tracking duplicate, non-200, and non-indexable pages

Page Categories

Indexable vs. Non-Indexable Pages

Indexable Pages shows all pages eligible for indexing with a unique/duplicate breakdown. Non-Indexable Pages shows pages blocked from indexing with specific reasons — canonical tag pointing elsewhere, noindex tag, redirected pages, and more.

200 vs. Non-200 Pages

200 Pages are properly served and available for indexing. Non-200 Pages require attention — review and resolve 404 and 500 errors. Uncrawled URLs are pages blocked from crawling, with justification shown (e.g., robots.txt disallow commands).

Primary and Duplicate Pages

Primary Pages are indexable, unique pages representing your core site content. Duplicate Pages are those sharing titles, descriptions, or substantially similar content — title and meta description optimization is recommended to differentiate them.

Content Analysis Sections

Content issues are among the most common technical SEO problems. Deepcrawl identifies them systematically:

Missing Titles: Pages without title tags — add unique, keyword-rich titles immediately
Short Titles: Title tags below optimal length — expand to include target keywords
Max Title Length: Oversized title tags appearing truncated in search results — shorten appropriately
Duplicate Titles: Identical title tags across pages — create unique titles for all indexed pages
Missing Descriptions: Missing meta description tags — add to improve CTR in search results
Duplicate Descriptions: Identical meta descriptions — create unique descriptions for key pages

Content Quality Sections

Empty Pages: Blank pages — redirect to relevant content or add substantive content
Thin Pages: Minimal content pages — develop content to enhance user and bot experience
Missing H1 Tags: Pages without H1 elements — add unique H1s to all indexed pages
Multiple H1 Tag Pages: Pages with multiple H1s — consolidate to a single H1 per page

Technical Issue Sections

Canonical to Non-200: Canonical tags referencing pages that don't return 200 — update URLs or canonical tags
Redirect Chains: Multiple sequential redirects consuming crawl budget — consolidate to direct redirects
HTTP Pages: Insecure pages — migrate all to HTTPS to avoid security warnings and ranking suppression
Broken Sitemap Links: Sitemap entries not returning 200 status — update or remove stale entries
Non-Indexable URLs in Sitemaps: Remove non-indexable pages from sitemaps unless intentional
All Broken Redirects: Redirect destinations returning errors — fix or cancel problematic redirects

Search Performance Integration

When Google Search Console is connected, Deepcrawl identifies Indexable Pages Without Search Impressions — indexed pages receiving zero Search Console impressions. These pages either need SEO optimization or should be evaluated for crawl budget management.

Regular Deepcrawl crawls with systematic problem resolution in priority order maintains website health and search visibility. While some issues are minor technical problems, others — like 500 errors, redirect chains, and missing canonical tags — significantly impact SEO performance and require immediate attention.