Services

Impact

Process

Pricing

About Us

Case Studies

Work with us

Services

Impact

Process

Pricing

About Us

Case Studies

Work with us

Services

Impact

Process

Pricing

About Us

Case Studies

Work with us

Case Study

3x Outreach Engagement by Backtesting CRM Data

Background & Challenge

Company details were redacted in order to respect the privacy of our client.

An early-stage B2B SaaS startup had put thoughtful work into defining their ideal customer profile (ICP). Their scoring model was sophisticated — factoring in company size, support team headcount, funding stage, tech stack, and geography.

But despite the strong logic behind their targeting, outbound wasn’t working.

Engagement was low. Credit utilization for list-building via Clay was high. There was a clear misalignment between who they thought was their best-fit customer and who was actually responding.

The question became:

“Is our ICP scoring hypothesis actually predicting real-world results?”

Moving From Assumptions to Evidence: How We Backtested the ICP

Blossomer led a systematic backtesting process to answer two key questions:

Correlation-Level Insight: Do higher-scoring accounts and leads actually convert at higher rates?
Feature-Level Analysis: Which specific signals are the true drivers of engagement and conversion?

Our goal was to replace guesswork with data — and prove whether the existing scoring model held up when tested against real outcomes.

Step 1: Gather Clean Historical Data

We merged data across HubSpot (Deals, Contacts, Companies) and Apollo (Sequences, Contact Engagement) to create a unified, account-level outcome table. Each row represented a company, enriched with:

Field	Description
`account_name`	Company name
`lead_score`	Contact-level score (0–5)
`account_score`	Account-level score (0–10)
`tier`	Tier bucket based on scoring
`opened_email`	1 if any email opened
`clicked_email`	1 if any email clicked
`replied_email`	1 if any email replied
`meeting_booked`	1 if a meeting was booked
`deal_conversion`	1 if deal reached key stages (Qualified, Piloting, etc.)
`job_title`	Contact title

Outcome Definition: We labeled a "conversion" as any deal reaching: Appointment Scheduled, Qualified to Buy, Decision Maker Bought-In, Piloting, Contract Sent, or Closed Won.

Step 2: Score vs. Outcome Correlation

We asked: Is there a meaningful relationship between the lead/account scores and actual outcomes like replies or meetings booked?

from scipy.stats import pointbiserialr
import pandas as pd

# Load merged dataset
df = pd.read_csv("scoring_backtest.csv")

# Correlation between lead score and replies
correlation, p_value = pointbiserialr(df['lead_score'], df['replied_email'])
print(f"Lead Score ↔ Reply Correlation: {correlation:.2f} (p={p_value:.3f})")

🔍 What We Found:

Correlation between lead score and meetings booked was weak (r < 0.2).
Certain "Tier 1" accounts had no engagement at all.
Some "Tier 2" or "Tier 3" accounts outperformed "Tier 1."

This suggested that the original scoring model was not predictive — and needed rethinking.

Step 3: Feature-Level Signal Strength Analysis

To uncover which specific signals actually drove engagement, we used logistic regression:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

features = ['employee_count', 'founding_year_recent', 'cx_hiring', 'recent_funding']
X = df[features]
y = df['meeting_booked']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)

print("Feature Coefficients:")
for feature, coef in zip(features, model.coef_[0]):
    print(f"{feature}: {coef:.2f}")

We confirmed that newer companies with recent CX hiring and active CX team growth were far more likely to engage.

Feature Insights

Signal	Predictive Strength
Founded After 2020	Strong
Recent CX Executive Hire	Strong
CX Team Growth (15%+ MoM)	Strong
Funding Stage Alone	Weak
Employee Count Alone	Weak

Step 4: Reweighting the Scoring Model

Based on the analysis, we:

Increased weight on validated signals: founding year, CX hiring, CX growth
Decreased weight on weak predictors: funding stage alone, employee count alone
Removed non-predictive criteria

The scoring model shifted from theory-based to evidence-backed.

🚀 Results: Focused Targeting, Real Outcomes

Tripled engagement rates on outbound campaigns
80% reduction in Clay credit utilization (thanks to tighter, more precise lists)
Sharper focus on high-fit conversations with decision-makers who were actually ready to engage

🎯 Key Takeaway

Even the best-designed ICP is still a hypothesis until you validate it. By systematically backtesting historical data, we helped this startup move from sophisticated guesswork to measurable, repeatable targeting — unlocking better conversations, higher engagement, and less wasted effort.

Background & Challenge

Company details were redacted in order to respect the privacy of our client.

But despite the strong logic behind their targeting, outbound wasn’t working.

Engagement was low. Credit utilization for list-building via Clay was high. There was a clear misalignment between who they thought was their best-fit customer and who was actually responding.

The question became:

“Is our ICP scoring hypothesis actually predicting real-world results?”

Moving From Assumptions to Evidence: How We Backtested the ICP

Blossomer led a systematic backtesting process to answer two key questions:

Correlation-Level Insight: Do higher-scoring accounts and leads actually convert at higher rates?
Feature-Level Analysis: Which specific signals are the true drivers of engagement and conversion?

Our goal was to replace guesswork with data — and prove whether the existing scoring model held up when tested against real outcomes.

Step 1: Gather Clean Historical Data

We merged data across HubSpot (Deals, Contacts, Companies) and Apollo (Sequences, Contact Engagement) to create a unified, account-level outcome table. Each row represented a company, enriched with:

Field	Description
`account_name`	Company name
`lead_score`	Contact-level score (0–5)
`account_score`	Account-level score (0–10)
`tier`	Tier bucket based on scoring
`opened_email`	1 if any email opened
`clicked_email`	1 if any email clicked
`replied_email`	1 if any email replied
`meeting_booked`	1 if a meeting was booked
`deal_conversion`	1 if deal reached key stages (Qualified, Piloting, etc.)
`job_title`	Contact title

Outcome Definition: We labeled a "conversion" as any deal reaching: Appointment Scheduled, Qualified to Buy, Decision Maker Bought-In, Piloting, Contract Sent, or Closed Won.

Step 2: Score vs. Outcome Correlation

We asked: Is there a meaningful relationship between the lead/account scores and actual outcomes like replies or meetings booked?

from scipy.stats import pointbiserialr
import pandas as pd

# Load merged dataset
df = pd.read_csv("scoring_backtest.csv")

# Correlation between lead score and replies
correlation, p_value = pointbiserialr(df['lead_score'], df['replied_email'])
print(f"Lead Score ↔ Reply Correlation: {correlation:.2f} (p={p_value:.3f})")

🔍 What We Found:

Correlation between lead score and meetings booked was weak (r < 0.2).
Certain "Tier 1" accounts had no engagement at all.
Some "Tier 2" or "Tier 3" accounts outperformed "Tier 1."

This suggested that the original scoring model was not predictive — and needed rethinking.

Step 3: Feature-Level Signal Strength Analysis

To uncover which specific signals actually drove engagement, we used logistic regression:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

features = ['employee_count', 'founding_year_recent', 'cx_hiring', 'recent_funding']
X = df[features]
y = df['meeting_booked']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)

print("Feature Coefficients:")
for feature, coef in zip(features, model.coef_[0]):
    print(f"{feature}: {coef:.2f}")

We confirmed that newer companies with recent CX hiring and active CX team growth were far more likely to engage.

Feature Insights

Signal	Predictive Strength
Founded After 2020	Strong
Recent CX Executive Hire	Strong
CX Team Growth (15%+ MoM)	Strong
Funding Stage Alone	Weak
Employee Count Alone	Weak

Step 4: Reweighting the Scoring Model

Based on the analysis, we:

Increased weight on validated signals: founding year, CX hiring, CX growth
Decreased weight on weak predictors: funding stage alone, employee count alone
Removed non-predictive criteria

The scoring model shifted from theory-based to evidence-backed.

🚀 Results: Focused Targeting, Real Outcomes

Tripled engagement rates on outbound campaigns
80% reduction in Clay credit utilization (thanks to tighter, more precise lists)
Sharper focus on high-fit conversations with decision-makers who were actually ready to engage

🎯 Key Takeaway

Background & Challenge

Company details were redacted in order to respect the privacy of our client.

But despite the strong logic behind their targeting, outbound wasn’t working.

Engagement was low. Credit utilization for list-building via Clay was high. There was a clear misalignment between who they thought was their best-fit customer and who was actually responding.

The question became:

“Is our ICP scoring hypothesis actually predicting real-world results?”

Moving From Assumptions to Evidence: How We Backtested the ICP

Blossomer led a systematic backtesting process to answer two key questions:

Correlation-Level Insight: Do higher-scoring accounts and leads actually convert at higher rates?
Feature-Level Analysis: Which specific signals are the true drivers of engagement and conversion?

Our goal was to replace guesswork with data — and prove whether the existing scoring model held up when tested against real outcomes.

Step 1: Gather Clean Historical Data

We merged data across HubSpot (Deals, Contacts, Companies) and Apollo (Sequences, Contact Engagement) to create a unified, account-level outcome table. Each row represented a company, enriched with:

Field	Description
`account_name`	Company name
`lead_score`	Contact-level score (0–5)
`account_score`	Account-level score (0–10)
`tier`	Tier bucket based on scoring
`opened_email`	1 if any email opened
`clicked_email`	1 if any email clicked
`replied_email`	1 if any email replied
`meeting_booked`	1 if a meeting was booked
`deal_conversion`	1 if deal reached key stages (Qualified, Piloting, etc.)
`job_title`	Contact title

Outcome Definition: We labeled a "conversion" as any deal reaching: Appointment Scheduled, Qualified to Buy, Decision Maker Bought-In, Piloting, Contract Sent, or Closed Won.

Step 2: Score vs. Outcome Correlation

We asked: Is there a meaningful relationship between the lead/account scores and actual outcomes like replies or meetings booked?

from scipy.stats import pointbiserialr
import pandas as pd

# Load merged dataset
df = pd.read_csv("scoring_backtest.csv")

# Correlation between lead score and replies
correlation, p_value = pointbiserialr(df['lead_score'], df['replied_email'])
print(f"Lead Score ↔ Reply Correlation: {correlation:.2f} (p={p_value:.3f})")

🔍 What We Found:

Correlation between lead score and meetings booked was weak (r < 0.2).
Certain "Tier 1" accounts had no engagement at all.
Some "Tier 2" or "Tier 3" accounts outperformed "Tier 1."

This suggested that the original scoring model was not predictive — and needed rethinking.

Step 3: Feature-Level Signal Strength Analysis

To uncover which specific signals actually drove engagement, we used logistic regression:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

features = ['employee_count', 'founding_year_recent', 'cx_hiring', 'recent_funding']
X = df[features]
y = df['meeting_booked']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)

print("Feature Coefficients:")
for feature, coef in zip(features, model.coef_[0]):
    print(f"{feature}: {coef:.2f}")

We confirmed that newer companies with recent CX hiring and active CX team growth were far more likely to engage.

Feature Insights

Signal	Predictive Strength
Founded After 2020	Strong
Recent CX Executive Hire	Strong
CX Team Growth (15%+ MoM)	Strong
Funding Stage Alone	Weak
Employee Count Alone	Weak

Step 4: Reweighting the Scoring Model

Based on the analysis, we:

Increased weight on validated signals: founding year, CX hiring, CX growth
Decreased weight on weak predictors: funding stage alone, employee count alone
Removed non-predictive criteria

The scoring model shifted from theory-based to evidence-backed.

🚀 Results: Focused Targeting, Real Outcomes

Tripled engagement rates on outbound campaigns
80% reduction in Clay credit utilization (thanks to tighter, more precise lists)
Sharper focus on high-fit conversations with decision-makers who were actually ready to engage

🎯 Key Takeaway

Background & Challenge

Company details were redacted in order to respect the privacy of our client.

But despite the strong logic behind their targeting, outbound wasn’t working.

Engagement was low. Credit utilization for list-building via Clay was high. There was a clear misalignment between who they thought was their best-fit customer and who was actually responding.

The question became:

“Is our ICP scoring hypothesis actually predicting real-world results?”

Moving From Assumptions to Evidence: How We Backtested the ICP

Blossomer led a systematic backtesting process to answer two key questions:

Correlation-Level Insight: Do higher-scoring accounts and leads actually convert at higher rates?
Feature-Level Analysis: Which specific signals are the true drivers of engagement and conversion?

Our goal was to replace guesswork with data — and prove whether the existing scoring model held up when tested against real outcomes.

Step 1: Gather Clean Historical Data

We merged data across HubSpot (Deals, Contacts, Companies) and Apollo (Sequences, Contact Engagement) to create a unified, account-level outcome table. Each row represented a company, enriched with:

Field	Description
`account_name`	Company name
`lead_score`	Contact-level score (0–5)
`account_score`	Account-level score (0–10)
`tier`	Tier bucket based on scoring
`opened_email`	1 if any email opened
`clicked_email`	1 if any email clicked
`replied_email`	1 if any email replied
`meeting_booked`	1 if a meeting was booked
`deal_conversion`	1 if deal reached key stages (Qualified, Piloting, etc.)
`job_title`	Contact title

Outcome Definition: We labeled a "conversion" as any deal reaching: Appointment Scheduled, Qualified to Buy, Decision Maker Bought-In, Piloting, Contract Sent, or Closed Won.

Step 2: Score vs. Outcome Correlation

We asked: Is there a meaningful relationship between the lead/account scores and actual outcomes like replies or meetings booked?

from scipy.stats import pointbiserialr
import pandas as pd

# Load merged dataset
df = pd.read_csv("scoring_backtest.csv")

# Correlation between lead score and replies
correlation, p_value = pointbiserialr(df['lead_score'], df['replied_email'])
print(f"Lead Score ↔ Reply Correlation: {correlation:.2f} (p={p_value:.3f})")

🔍 What We Found:

Correlation between lead score and meetings booked was weak (r < 0.2).
Certain "Tier 1" accounts had no engagement at all.
Some "Tier 2" or "Tier 3" accounts outperformed "Tier 1."

This suggested that the original scoring model was not predictive — and needed rethinking.

Step 3: Feature-Level Signal Strength Analysis

To uncover which specific signals actually drove engagement, we used logistic regression:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

features = ['employee_count', 'founding_year_recent', 'cx_hiring', 'recent_funding']
X = df[features]
y = df['meeting_booked']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)

print("Feature Coefficients:")
for feature, coef in zip(features, model.coef_[0]):
    print(f"{feature}: {coef:.2f}")

We confirmed that newer companies with recent CX hiring and active CX team growth were far more likely to engage.

Feature Insights

Signal	Predictive Strength
Founded After 2020	Strong
Recent CX Executive Hire	Strong
CX Team Growth (15%+ MoM)	Strong
Funding Stage Alone	Weak
Employee Count Alone	Weak

Step 4: Reweighting the Scoring Model

Based on the analysis, we:

Increased weight on validated signals: founding year, CX hiring, CX growth
Decreased weight on weak predictors: funding stage alone, employee count alone
Removed non-predictive criteria

The scoring model shifted from theory-based to evidence-backed.

🚀 Results: Focused Targeting, Real Outcomes

Tripled engagement rates on outbound campaigns
80% reduction in Clay credit utilization (thanks to tighter, more precise lists)
Sharper focus on high-fit conversations with decision-makers who were actually ready to engage

🎯 Key Takeaway

Start Building Your Outbound System

Book a free discovery call to understand how we work. You'll receive actionable strategies upfront, then decide if we're the right fit. We move fast, with no lengthy onboarding delays.

Get Started for Free

Start Building Your Outbound System

Book a free discovery call to understand how we work. You'll receive actionable strategies upfront, then decide if we're the right fit. We move fast, with no lengthy onboarding delays.

Get Started for Free

Start Building Your Outbound System

Book a free discovery call to understand how we work. You'll receive actionable strategies upfront, then decide if we're the right fit. We move fast, with no lengthy onboarding delays.

Get Started for Free

Start Building Your Outbound System

Book a free discovery call to understand how we work. You'll receive actionable strategies upfront, then decide if we're the right fit. We move fast, with no lengthy onboarding delays.

Get Started for Free

We help early-stage B2B SaaS founders land their first 100 customers through expertly built, fully managed outbound systems. Get qualified opportunities without hiring SDRs, and build predictable revenue with systems that scale.

Quick Links

Home

Case Studies

About

Talk to Our CEO

Quick Links

Home

Case Studies

About

Talk to Our CEO

Quick Links

Home

Case Studies

About

Talk to Our CEO

Quick Links

Home

Case Studies

About Us

Talk to Our CEO