Шаблоны промптов · Crawlbase Documentation

Извлечение данных с одной страницы

Используйте, когда нужны структурированные данные с конкретного URL. Напрямую, без цикла агента.

You will be given a URL. Use the crawl_url tool to fetch it, then
extract a JSON object matching this schema:

{
  "title": string,
  "author": string | null,
  "published_date": ISO 8601 date | null,
  "main_image_url": string | null,
  "summary": string  // 2-3 sentences
}

Return ONLY the JSON object, no commentary.

URL: {url}

Совет: зафиксируйте модель в режиме JSON-вывода, если ваш клиент это поддерживает. Иначе используйте JSON-парсер, который терпимо относится к пробелам в начале и конце.

Исследование по нескольким источникам

Для задач вида «что в интернете говорят про X». Сочетает поиск и загрузку страниц.

You are a research assistant. Given a topic, you must:

1. Use search_web to find 5-8 high-quality recent sources.
2. Use crawl_url on the top 3-4 to read them in full.
3. Synthesize findings into a brief with:
   - Key facts (bulleted)
   - Points of agreement across sources
   - Points of disagreement, with attribution
   - Open questions

Always cite sources by URL. Reject low-quality results (forums,
content farms) and search again if needed.

Topic: {topic}

Отслеживание изменений

Для сценариев «сообщи мне, когда X изменится». Используйте вместе с запланированной задачей.

You are monitoring this URL: {url}
The previous snapshot is in ... tags below.

Use crawl_url to fetch the current version. Compare them and report:

- Has the page changed in any meaningful way? (Ignore timestamps,
  view counts, ad rotations.)
- If yes, summarize what changed in 1-3 bullet points.
- If no, respond with the single word "UNCHANGED".


{previous_snapshot}

Визуальный QA

Совместите инструмент скриншотов с возможностями зрения модели для проверки вёрстки.

Use the screenshot tool with mode=fullpage on this URL: {url}.

Then evaluate the page on these criteria:
- Is there a clear primary call-to-action above the fold?
- Is the hero text scannable in under 3 seconds?
- Are there any obvious layout regressions (overlapping elements,
  truncated text, broken images)?

Be specific - point to coordinates or sections, not vague feelings.

Обогащение лидов

Для продаж/маркетинга: начните с имени и компании, получите готовый профиль.

You will receive a name and company. Your job is to enrich them
into a structured profile.

1. search_web for "{name} {company} linkedin" - find the LinkedIn URL.
2. scrape_structured with scraper=linkedin-profile on that URL.
3. search_web for "{company}" to find their domain.
4. crawl_url the company homepage and extract a 1-line description.

Return:
{
  "name": ..., "title": ..., "linkedin": ...,
  "company": ..., "company_domain": ..., "company_description": ...
}

If any step fails or returns low-confidence results, set the field
to null rather than guessing.

Всегда задавайте путь отказа

AI-инструменты гораздо аккуратнее обрабатывают сбои, если вы заранее говорите им, что делать при ошибке. «Поставь null вместо догадок» намного лучше, чем молча позволять модели выдумывать ответы из обучающих данных.

Общие советы

Указывайте схему. Не просите «данные с этой страницы» - опишите точные поля, которые вам нужны.
Ограничивайте рекурсивный обход. Сообщайте агенту максимальное число URL, которое он может загрузить за один шаг.
Кэшируйте, когда возможно. Используйте store=true, чтобы не обходить один и тот же URL повторно между шагами.
Задавайте page_wait для SPA. Упомяните это в промпте: «для сайтов с клиентским рендерингом используйте page_wait=2000».