Multi-Source Workflow Automation with Clay

A client once asked me: “Can you find all Series B SaaS companies in fintech that raised funding in the last 6 months, are hiring for RevOps roles, use HubSpot, and have 50-200 employees?” I stared at my screen thinking about how many tools I’d need to cobble together—Crunchbase for funding, LinkedIn for jobs, BuiltWith for tech stack, Clearbit for firmographics. It would take weeks of manual work.

Then I rebuilt the entire workflow in Clay. It took four hours to set up, ran in 20 minutes, and returned 147 perfectly qualified accounts with complete enrichment. That’s the power of multi-source orchestration.

Why Single-Source Workflows Fall Short

The Data Coverage Gap No single enrichment provider has complete data. Clearbit might have great firmographics but weak contact data. ZoomInfo has contacts but limited technographics. LinkedIn has hiring data but no funding information.

The Accuracy Problem One data source can be wrong. Cross-referencing multiple sources improves accuracy—if three providers say a company has 150 employees and one says 500, you can trust the consensus.

The Freshness Challenge Some data sources update faster than others. Combining real-time scraping with cached enrichment APIs gives you both speed and freshness.

The Clay Multi-Source Advantage

1. Parallel Execution Clay runs multiple enrichment providers simultaneously, so fetching data from 5 sources takes the same time as fetching from 1.

2. Waterfall Logic Start with free sources, escalate to paid only when necessary, optimizing cost without sacrificing quality.

3. Data Synthesis Merge outputs from multiple providers into single clean records, handling conflicts and filling gaps intelligently.

4. Conditional Branching Route leads through different workflows based on data quality, enrichment results, or classification outcomes.

Building Multi-Source Workflows in Clay

Use Case 1: Comprehensive Company Enrichment

Objective: Enrich a list of domains with maximum data coverage at minimum cost.

Clay Table Structure:

Column A: Domain (input)
Column B: Clearbit Company Data
Column C: Apollo Org Data
Column D: LinkedIn Company Scrape
Column E: Website Metadata Scrape
Column F: BuiltWith Tech Stack
Column G: Crunchbase Funding Data
Column H: Recent Job Postings (from Greenhouse API)
Column I: Synthesized Output

Waterfall Enrichment Pattern:

Free Tier: Website Scraping

// Clay HTTP Request Column
const domain = {{Domain}};
const url = `https://${domain}`;

// Fetch homepage
const html = await fetch(url);

// Extract meta tags
const companyName = html.match(/<meta property="og:site_name" content="([^"]+)"/)?.[1];
const description = html.match(/<meta name="description" content="([^"]+)"/)?.[1];

// Detect employee hints
const employeeMatch = html.match(/(\d+)\+?\s*employees/i);
const employeeHint = employeeMatch ? parseInt(employeeMatch[1]) : null;

return {
  company_name: companyName,
  description: description,
  employee_hint: employeeHint,
  source: 'website_scrape',
  cost: 0
};

Budget Tier: Apollo (if website scrape incomplete)

// Clay formula column - conditional API call
if ({{Website Scrape.employee_hint}} === null || {{Website Scrape.company_name}} === null) {
  // Website scrape insufficient - call Apollo
  return APOLLO_ORG({domain: {{Domain}}});
} else {
  // Website scrape sufficient - skip Apollo
  return {skipped: true, reason: "sufficient data from free source"};
}

Premium Tier: Clearbit (high-value leads only)

// Conditional Clearbit enrichment
if (
  {{Lead Score}} > 70 &&
  ({{Apollo.employee_count}} === null || {{Website Scrape.revenue_hint}} === null)
) {
  return CLEARBIT_COMPANY({domain: {{Domain}}});
} else {
  return {skipped: true, reason: "low value lead or sufficient data"};
}

Synthesis: Merge All Sources

// Clay formula: Intelligent data merging
const sources = [
  {{Clearbit Company Data}},
  {{Apollo Org Data}},
  {{Website Scrape}}
];

// Company name - prefer Clearbit, fallback to others
const companyName =
  sources.find(s => s.name)?.name ||
  sources.find(s => s.company_name)?.company_name ||
  {{Domain}};

// Employee count - use consensus or most recent
const employeeCounts = sources
  .map(s => s.employee_count || s.employees || s.employee_hint)
  .filter(n => n !== null);

const employeeCount = employeeCounts.length > 0
  ? Math.round(employeeCounts.reduce((a, b) => a + b) / employeeCounts.length)
  : null;

// Industry - merge and deduplicate
const industries = sources
  .flatMap(s => s.industry || s.industries || [])
  .filter((v, i, a) => a.indexOf(v) === i);

return {
  company_name: companyName,
  employee_count: employeeCount,
  industries: industries,
  data_sources: sources.filter(s => !s.skipped).map(s => s.source).join(', '),
  data_completeness: calculateCompleteness([companyName, employeeCount, industries])
};

Use Case 2: Hiring Intent Detection

Objective: Identify companies actively hiring for specific roles as a buying signal.

Clay Workflow:

Step 1: Job Board Scraping

// Clay HTTP API column - Greenhouse Jobs API
const domain = {{Domain}};

// Search company on Greenhouse
const greenhouseResponse = await fetch(
  `https://boards-api.greenhouse.io/v1/boards/search?query=${encodeURIComponent(domain)}`
);

const companyBoard = greenhouseResponse.jobs?.[0];

if (companyBoard) {
  const jobs = await fetch(
    `https://boards-api.greenhouse.io/v1/boards/${companyBoard.id}/jobs`
  );

  // Filter for relevant roles
  const revOpsJobs = jobs.filter(job =>
    job.title.match(/revenue operations|sales operations|marketing operations/i) ||
    job.departments.some(d => d.name.match(/revenue|operations/i))
  );

  return {
    total_openings: jobs.length,
    revops_openings: revOpsJobs.length,
    job_titles: revOpsJobs.map(j => j.title),
    job_details: revOpsJobs,
    source: 'greenhouse'
  };
}

return {no_jobs_found: true};

Step 2: LinkedIn Jobs Scraping

// Clay enrichment - LinkedIn Company Jobs
const companyLinkedIn = {{LinkedIn Company URL}};

// Use Clay's LinkedIn integration
const linkedInJobs = LINKEDIN_COMPANY_JOBS({
  company_url: companyLinkedIn,
  job_title_keywords: ["Revenue Operations", "Sales Operations", "RevOps"],
  posted_within_days: 30
});

return linkedInJobs;

Step 3: AI Analysis of Job Descriptions

// Clay GPT-4 column
const jobs = [
  ...{{Greenhouse Jobs.job_details}},
  ...{{LinkedIn Jobs.results}}
];

if (jobs.length === 0) {
  return {hiring_intent: "none"};
}

// AI Prompt
const prompt = `
Analyze these job postings to determine hiring intent and pain points:

${jobs.map(job => `
Title: ${job.title}
Description: ${job.description.substring(0, 500)}
`).join('\n---\n')}

Assess:
1. Is this a new role or backfill? (new roles indicate growth/change)
2. What problems are they trying to solve?
3. What tools do they mention?
4. What's the urgency level?
5. What's the seniority of the role?

Output JSON:
{
  "intent_level": "high|medium|low",
  "role_type": "new|backfill|expansion",
  "pain_points": ["array"],
  "mentioned_tools": ["array"],
  "budget_indicator": "high|medium|low",
  "urgency": "immediate|normal|low"
}
`;

return CLAY_GPT4(prompt);

Step 4: Hiring Intent Score

// Clay formula: Calculate intent score
const greenhouseJobs = {{Greenhouse Jobs.revops_openings}} || 0;
const linkedinJobs = {{LinkedIn Jobs.results}}?.length || 0;
const aiAnalysis = {{AI Job Analysis}};

let score = 0;

// Multiple openings = strong signal
if (greenhouseJobs + linkedinJobs >= 3) score += 40;
else if (greenhouseJobs + linkedinJobs >= 2) score += 25;
else if (greenhouseJobs + linkedinJobs >= 1) score += 15;

// New role vs backfill
if (aiAnalysis.role_type === "new") score += 20;
else if (aiAnalysis.role_type === "expansion") score += 15;

// Seniority indicates budget
if (aiAnalysis.budget_indicator === "high") score += 15;
else if (aiAnalysis.budget_indicator === "medium") score += 10;

// Urgency
if (aiAnalysis.urgency === "immediate") score += 10;

return {
  hiring_intent_score: score,
  hiring_signal: score >= 60 ? "strong" : score >= 30 ? "moderate" : "weak"
};

Use Case 3: Technographic + Funding Overlap

Objective: Find companies using specific tech stacks who recently raised funding (high buying power + potential tool churn).

Step 1: BuiltWith Tech Stack Detection

// Clay BuiltWith integration
const domain = {{Domain}};

const techStack = BUILTWITH({
  domain: domain,
  categories: [
    "crm",
    "marketing-automation",
    "analytics",
    "customer-data-platform"
  ]
});

return techStack;

Step 2: Crunchbase Funding Data

// Clay Crunchbase integration
const companyName = {{Company Name}};

const fundingData = CRUNCHBASE({
  company_name: companyName,
  include: ["funding_rounds", "investors", "valuation"]
});

// Filter for recent funding
const recentRounds = fundingData.funding_rounds?.filter(round => {
  const roundDate = new Date(round.announced_on);
  const sixMonthsAgo = new Date();
  sixMonthsAgo.setMonth(sixMonthsAgo.getMonth() - 6);

  return roundDate > sixMonthsAgo;
});

return {
  ...fundingData,
  recent_funding: recentRounds,
  recent_funding_total: recentRounds?.reduce((sum, r) => sum + r.money_raised_usd, 0) || 0
};

Step 3: Competitive Tool Analysis

// Clay formula: Identify replacement opportunities
const techStack = {{BuiltWith.technologies}};
const competitors = ["pipedrive", "zoho-crm", "monday-crm"]; // Your competitor tools

const usingCompetitor = techStack.some(tech =>
  competitors.some(comp => tech.name.toLowerCase().includes(comp))
);

const competitorTool = techStack.find(tech =>
  competitors.some(comp => tech.name.toLowerCase().includes(comp))
)?.name;

const fundingAmount = {{Crunchbase.recent_funding_total}};
const hasFunding = fundingAmount > 0;

return {
  using_competitor: usingCompetitor,
  competitor_tool: competitorTool,
  has_recent_funding: hasFunding,
  replacement_opportunity: usingCompetitor && hasFunding,
  opportunity_score: usingCompetitor && hasFunding ? 90 :
                    usingCompetitor ? 60 :
                    hasFunding ? 40 : 20
};

Objective: Monitor company blogs and social media for change signals.

Step 1: RSS Feed Monitoring

// Clay HTTP column - Parse company blog
const domain = {{Domain}};

// Try common blog URLs
const blogUrls = [
  `https://${domain}/blog/feed`,
  `https://blog.${domain}/feed`,
  `https://${domain}/rss`
];

let posts = [];

for (const url of blogUrls) {
  try {
    const feed = await fetch(url);
    posts = parseFeedPosts(feed);
    break;
  } catch (e) {
    continue;
  }
}

// Analyze recent posts for signals
const recentPosts = posts.filter(p => {
  const postDate = new Date(p.published);
  const thirtyDaysAgo = new Date();
  thirtyDaysAgo.setDate(thirtyDaysAgo.getDate() - 30);
  return postDate > thirtyDaysAgo;
});

return {
  recent_post_count: recentPosts.length,
  recent_titles: recentPosts.map(p => p.title),
  blog_url: posts.length > 0 ? url : null
};

Step 2: LinkedIn Activity Scraping

// Clay LinkedIn integration
const companyLinkedIn = {{LinkedIn Company URL}};

const recentPosts = LINKEDIN_COMPANY_POSTS({
  company_url: companyLinkedIn,
  limit: 10,
  days_back: 30
});

// Look for change signals
const changeKeywords = [
  "excited to announce",
  "new leadership",
  "joining the team",
  "raising our Series",
  "expanding into",
  "new office"
];

const changeSignals = recentPosts.filter(post =>
  changeKeywords.some(keyword =>
    post.text.toLowerCase().includes(keyword.toLowerCase())
  )
);

return {
  total_posts: recentPosts.length,
  change_signals: changeSignals,
  change_signal_count: changeSignals.length
};

Step 3: AI Signal Classification

// Clay GPT-4 analysis
const blogPosts = {{Blog Feed.recent_titles}};
const linkedInPosts = {{LinkedIn Activity.change_signals}};

const prompt = `
Analyze these company communications for buying signals:

Blog Posts:
${blogPosts.join('\n')}

LinkedIn Posts:
${linkedInPosts.map(p => p.text.substring(0, 200)).join('\n---\n')}

Identify signals indicating:
1. Growth/scaling challenges
2. New leadership (likely to change tools)
3. Funding/expansion (budget available)
4. Product launches (may need new infrastructure)
5. Hiring surges (ops challenges)

Output JSON:
{
  "growth_signals": ["array"],
  "leadership_changes": ["array"],
  "expansion_signals": ["array"],
  "buying_likelihood": "high|medium|low",
  "recommended_timing": "immediate|1-3_months|3-6_months"
}
`;

return CLAY_GPT4(prompt);

Advanced Orchestration Patterns

Conditional Workflow Branching

Use Clay’s filtering to create dynamic workflows:

IF {{Data Completeness}} < 60%:
  → Run additional enrichment (Clearbit)
ELSE:
  → Skip to next step

IF {{Industry}} = "Financial Services":
  → Use specialized fintech data provider
ELSE:
  → Use standard enrichment

IF {{Employee Count}} > 1000:
  → Route to enterprise research queue
ELSE:
  → Continue automated enrichment

Sequential Enrichment with Dependencies

Step 1: Get company LinkedIn URL from Clearbit
  ↓
Step 2: Use LinkedIn URL to scrape company page
  ↓
Step 3: Extract employee LinkedIn URLs from company page
  ↓
Step 4: Enrich individual employees with Apollo
  ↓
Step 5: Identify decision makers based on titles
  ↓
Step 6: Find personal emails using Hunter.io

Parallel Processing with Aggregation

Column 1: Clearbit (firmographics)
Column 2: BuiltWith (tech stack)
Column 3: Crunchbase (funding)
Column 4: LinkedIn (hiring)
Column 5: Website scrape (content)
  ↓
All run simultaneously
  ↓
Column 6: Aggregate and synthesize

Cost Optimization

Track costs across providers:

// Clay formula: Calculate total enrichment cost
const costByProvider = {
  clearbit: {{Clearbit Company Data.skipped}} ? 0 : 1.50,
  apollo: {{Apollo Org Data.skipped}} ? 0 : 0.10,
  builtwith: {{BuiltWith.skipped}} ? 0 : 0.25,
  crunchbase: {{Crunchbase.skipped}} ? 0 : 0.50,
  hunter: {{Hunter.io.skipped}} ? 0 : 0.04,
  linkedin: 0, // Using scraping, not API
  website: 0  // Free
};

const totalCost = Object.values(costByProvider).reduce((a, b) => a + b, 0);

return {
  cost_breakdown: costByProvider,
  total_cost: totalCost,
  cost_per_data_point: totalCost / {{Data Completeness}}
};

FAQ

Q: How many data sources should I use per workflow? A: Start with 3-4 core sources that cover different data types (firmographic, technographic, intent). Add more sources only when specific data gaps appear. More isn’t always better—focus on quality over quantity.

Q: How do I handle conflicting data from multiple sources? A: Use consensus logic (majority rules), recency (newest data wins), or hierarchy (trust Clearbit over website scrape). For critical fields like employee count, average multiple sources.

Q: What’s the best order for enrichment providers in a waterfall? A: Always start with free sources (website scraping, public APIs). Then budget APIs (Hunter, FullContact). Reserve expensive APIs (Clearbit, ZoomInfo) for leads that pass quality thresholds.

Q: How do I prevent hitting rate limits across multiple providers? A: Use Clay’s built-in rate limiting features. For custom APIs, add delay formulas between batches. Monitor provider-specific limits and set up alerts when approaching caps.

Q: Can I use Clay for real-time enrichment or only batch processing? A: Both. For batch, upload CSV and let Clay process. For real-time, use Clay’s API or webhook triggers to enrich leads as they arrive in your CRM.

Q: How do I test multi-source workflows without burning API credits? A: Run on small test batches (10-20 rows) first. Use Clay’s “Run on sample” feature. For expensive APIs, add skip conditions during testing then remove for production.

Q: What’s the optimal balance between automation and manual review? A: Automate data fetching and merging 100%. Use AI for first-pass analysis. Add manual review for high-value leads (scores >80) before outreach. This typically means reviewing 20% of leads while automating 80%.

Multi-source workflow automation in Clay transforms scattered data into unified intelligence. Start simple with 2-3 sources, prove the value, then systematically add sources to fill gaps. The goal isn’t to use every possible data provider—it’s to get complete, accurate data at the lowest cost.