Apify logo

Crawl Website Content (Apify)

ApifyCrawl Website Content (Apify)

POST/v1/apify-website-content-crawler
texau__apify-website-content-crawlerenrichment
cURL · api.texau.app
84ms
POST/v1/apify-website-content-crawler
{
  "url": "https://acme.com"
}
200 Accepted
{
  "ok": true,
  "data": {
    "page_text": "sample",
    "page_title": "sample",
    "page_description": "sample",
    "page_canonical_url": "https://acme.com",
    "page_language_code": "sample",
    "crawl_loaded_url": "https://acme.com",
    "crawl_referrer_url": "https://acme.com"
  }
}

Crawls website pages and extracts structured content including text, metadata, and URLs.

Install

Add crawl website content (apify) to your MCP client.

Drop this into claude_desktop_config.json (or your client's equivalent) and the tool shows up in any chat.

claude_desktop_config.json
{
  "mcpServers": {
    "texau": {
      "command": "npx",
      "args": ["-y", "@texau/mcp-server"],
      "env": { "TEXAU_API_KEY": "..." }
    }
  }
}

Tool name: texau__apify-website-content-crawler

When to use this.

The "Crawl Website Content (Apify)" action is designed to efficiently crawl specified website pages and extract structured content, including text, metadata, and URLs. By providing a single input parameter, the website URL, users can initiate the crawling process. The action outputs essential data fields such as page text, title, description, canonical URL, language code, loaded URL, and referrer URL, all formatted as text or URL types. This action is particularly useful for data enrichment, enabling businesses to gather insights from web content for SEO analysis, competitive research, and content aggregation. With its ability to deliver structured data, it supports various applications in digital marketing, content management, and web analytics.

Try it

Run a sample request.

The response is a deterministic, cached example. No live call, no credits used.

Crawl Website Content (Apify)

Try it

Website URL

Response is cached — no live API call.

Response

Output schema.

Every field returned in `data`. Click rows to expand nested objects.

FieldType
  • page_text
    Page Text
    text
  • page_title
    Page Title
    text
  • page_description
    Page Description
    text
  • page_canonical_url
    Page Canonical Url
    nullabletext
  • page_language_code
    Page Language Code
    nullabletext
  • crawl_loaded_url
    Crawl Loaded Url
    text
  • crawl_referrer_url
    Crawl Referrer Url
    nullabletext

Integrate

Copy-pasteable snippets.

Real endpoint: https://v3-api.texau.com/api/v1/apify-website-content-crawler. Auth: x-api-key.

cURL · api.texau.app
84ms
POST/v1/apify-website-content-crawler
curl -X POST 'https://v3-api.texau.com/api/v1/apify-website-content-crawler' \
  -H 'x-api-key: $TEXAU_API_KEY' \
  -H 'content-type: application/json' \
  -d '{"url":"https://acme.com"}'
200 Accepted
{
  "ok": true,
  "data": {
    "page_text": "sample",
    "page_title": "sample",
    "page_description": "sample",
    "page_canonical_url": "https://acme.com",
    "page_language_code": "sample",
    "crawl_loaded_url": "https://acme.com",
    "crawl_referrer_url": "https://acme.com"
  }
}

Output

Results land in a TexAu table.

Sample rows below.

Real result preview coming soon.

InputStatusScore
[email protected]valid96
[email protected]risky54
[email protected]invalid12

Workflow

A real example.

Trigger → crawl website content (apify) → enrich → push to your CRM. ~80 ms operator effort, the rest runs in the background.

Step 1TriggerNew row, webhook, or schedule.
Step 2Crawl Website Content (Apify)Apify action runs.
Step 3EnrichWaterfall fills missing fields.
Step 4Push to CRMHubSpot / Salesforce / Pipedrive.

Built for

Who runs this.

GTM EngineerRevOpsAgency

Reliability

Rate limits & reliability.

  • Per-minute limit30 / min
  • Per-day limit5,000 / day
  • RetriesAutomatic w/ backoff
  • ModeSync

Errors

HTTP status codes.

What each response means and what to do about it.

CodeCauseFix
200 OKAction ran. Data in `data`.Read response.
400 Bad RequestMissing or malformed input.Validate against the input schema.
401 UnauthorizedMissing or invalid `x-api-key`.Re-issue from /api-platform.
403 ForbiddenWorkspace lacks plan tier.Upgrade or contact sales.
404 Not FoundAction key not recognized.Verify the slug.
429 Rate LimitedPer-minute or per-day cap hit.Backoff; reduce concurrency.
500 Server ErrorUnexpected TexAu issue.Retry with backoff.
502 Bad GatewayUpstream provider 5xx.Retry; we surface root cause.
504 TimeoutUpstream slower than maxLatency.Switch to `isAsync` polling.

Pricing

What it costs to run.

Standard tier

Pricing tier on /pricing. Per-action credit cost is private.

FAQ.

  • Is this real-time?

    Yes. Synchronous actions return in ~1–4 s. Long-running work uses async polling (see status 504 → switch to async).

  • Do I get charged on failure?

    No. Verified failures cost zero credits. Provider miss / 5xx / timeout cascade to the next provider in the waterfall when applicable.

  • Does it work with Claude / Cursor via MCP?

    Yes. Add the texau MCP server to your client config, then call `texau__apify-...` directly.

  • What CRMs can I push results to?

    HubSpot, Salesforce, Pipedrive, Zoho, and GoHighLevel are bidirectional. Smartlead, Instantly, Lemlist, HeyReach, Apollo Sequences, and Reply.io for outbound.

Run Crawl Website Content (Apify) in 60 seconds.

Pull your API key, paste the cURL, ship to your CRM.