Skip to content

Cold Email A/B Testing: What to Test and How to Measure Results

Affiliate disclosure: some links in this article are partner links. If you start a paid plan through them, imisofts may earn a commission at no extra cost to you. We only recommend tools we actually use to run client campaigns.

Most people run cold email campaigns, get mediocre results, and move on.

At imisofts, we run 100+ A/B tests monthly. Each test teaches us something. Compound those lessons, and you get 65-75% open rates and 3-5% reply rates.

This is our A/B testing framework.

The Cold Email A/B Testing Hierarchy

Not all tests have equal impact. We prioritize:

  1. Subject line (highest impact)
  2. Opening line (second highest)
  3. Value statement (medium impact)
  4. CTA (medium-low impact)
  5. Send time (lowest impact)

Never test low-impact variables first. You'll waste samples before finding the real wins.

Testing Subject Lines (Highest Impact)

Subject lines determine open rate. Open rate determines everything else.

Test structure:

Variant A (baseline): [Current subject line that gets 45% opens]

Variant B (test): [New subject line with different formula]

Sample size: 50-100 prospects each

Duration: 3-5 days

Example test:

Variant A: "hi john, noticed you launched [product]" (45% open rate)

Variant B: "quick thought on [industry]" (52% open rate)

Winner: Variant B (+7 percentage points)

This 7-point improvement compounds. Across 10,000 email prospects, that's 700 additional opens. 700 additional opens means 7-35 additional replies (at 1-5% reply rate).

That's 7-35 additional customers from changing one word.

Subject line tests we run:

  • Personalization (first name + achievement vs. generic)
  • Question format vs. statement format
  • Lowercase vs. Title Case
  • Short vs. specific
  • Different achievement angles

Testing Opening Lines (Second Highest Impact)

Opening line determines whether they read past the first sentence.

Test structure:

Email 1, Variant A: [Opening line A] + [rest of email unchanged]

Email 1, Variant B: [Opening line B] + [rest of email unchanged]

Sample size: 100 prospects each

Duration: 3-5 days

Metric: Open rate from click to read percentage (hard to track without advanced tools)

Alternative: Track reply rate as proxy for "engagement with content"

Example test:

Variant A: "I noticed you launched [product] last month." (1.2% reply rate)

Variant B: "Most SaaS founders spend 15 hours/week on prospecting. You probably do too." (1.8% reply rate)

Winner: Variant B (+0.6 percentage points reply rate)

0.6 points seems small. Across 10,000 prospects, it's 60 additional replies.

Opening line tests we run:

  • Specific personalization vs. general observation
  • Problem-first vs. achievement-first
  • Curiosity gap vs. direct statement
  • Industry pattern vs. company-specific observation

Testing Value Statements (Medium Impact)

Value statement is your chance to prove relevance before the pitch.

Test structure:

Email 1, Variant A: [Value statement A]

Email 1, Variant B: [Value statement B]

Metric: Email 1 reply rate or Email 2 open rate (if they engage with Email 1, they'll engage with Email 2)

Example test:

Variant A: "We help SaaS teams automate their prospecting and save 12 hours/week." (2% Email 1 reply)

Variant B: "One of your competitors just booked 8 qualified deals this month using [tactic]." (2.8% Email 1 reply)

Winner: Variant B (social proof outperforms direct benefit)

This changes your entire Email 1 strategy. Across campaigns, social proof hooks outperform benefit hooks by 20-40%.

Value statement tests we run:

  • Direct benefit vs. social proof
  • Specific metric vs. general statement
  • Industry pattern vs. company-specific observation
  • Problem-agitation vs. opportunity-excitement

Testing CTAs (Medium-Low Impact)

CTA wording has lower impact than subject/opening, but still matters.

Test structure:

Email 2, Variant A: "[CTA A] or reply with your timeline."

Email 2, Variant B: "[CTA B] or reply with your timeline."

Metric: Email 2 reply rate

Example test (SaaS):

Variant A: "Book a 15-min strategy call" (3.2% reply)

Variant B: "Are you open to a quick conversation?" (2.8% reply)

Winner: Variant A (direct booking link outperforms vague ask)

But this varies by industry. Medicare-focused campaigns might see opposite results (phone-first CTAs outperform calendar links).

CTA tests we run:

  • Direct booking link vs. "reply to schedule"
  • Soft ask vs. hard ask
  • Specific time ("15 min") vs. vague ("quick call")
  • Phone number vs. calendar link (varies by industry)

Testing Send Times (Lowest Impact)

When you send affects open rate, but much less than what you send.

Test structure:

Group A: Send on Tuesday, 10 AM

Group B: Send on Thursday, 10 AM

Metric: Open rate

Our data across 50M+ emails:

Tuesday: 45% open rate

Wednesday: 47% open rate

Thursday: 48% open rate

Friday: 40% open rate

Monday: 38% open rate

Best day: Thursday, 10 AM

Worst day: Monday, 10 AM

Difference: ~10 percentage points

That matters, but nowhere near as much as subject line testing (which can change open rate by 30+ points).

Send time tests we run:

  • Weekday vs. weekend
  • Morning vs. afternoon vs. evening
  • Time zone-specific sends
  • Industry-specific patterns (e.g., healthcare gets higher open on Friday due to weekly planning)

How to Measure Statistical Significance

You don't need a PhD in statistics. Here's the simple rule:

Sample size of 50+ per variant. If you see a 5%+ difference, it's probably real.

More rigorous approach:

Use a binomial test calculator.

Example:

  • Variant A: 45 opens out of 100 (45%)
  • Variant B: 52 opens out of 100 (52%)
  • Difference: 7 percentage points

Question: Is this real or random?

Plug into calculator. If p-value < 0.05, it's statistically significant (95% confidence). You can trust the result.

For cold email, we use this rule of thumb:

Sample < 50: Don't trust the result. Run more samples.

Sample 50-100: If difference > 5%, probably real.

Sample 100-200: If difference > 3%, probably real.

Sample 200+: If difference > 2%, probably real.

The Weekly Testing Cycle

Monday: Review last week's tests. Declare winners.

Tuesday-Wednesday: Roll out winning variant to 50% of new prospects.

Wednesday-Thursday: Run new tests on remaining 50%.

Friday: Measure results.

Monday: Repeat.

This weekly cycle compounds. Each week you find one new winning variant. Month 1, you're at baseline. Month 3, you're 30-40% above baseline.

What Not to Test

Don't test too many things at once

Wrong: Test subject line, opening line, CTA, send time simultaneously.

You won't know which variable won. Also called "multivariate testing" and it requires huge sample sizes.

Right: Test one variable per week.

Subject line week 1. Opening line week 2. CTA week 3. You learn faster and with smaller samples.

Don't test on tiny samples

Wrong: Test on 10 people per variant.

Too much variance. Random chance plays huge role.

Right: Test on 50+ people per variant minimum.

This gives signal above noise.

Don't declare winners too early

Wrong: Run test for 24 hours. Declare winner.

Time of day matters. Day of week matters. One day isn't enough.

Right: Run test for 5-7 days minimum.

This accounts for daily/weekly patterns.

Testing Template: What We Track

| Element | Variant A | Variant B | Winner | Notes |

|---------|----------|----------|--------|-------|

| Subject Line | hi john, noticed [product] | quick thought on [industry] | B | +7 points open rate |

| Opening Line | I noticed... | Most [industry]... | B | Engagement higher |

| Value Statement | Direct benefit | Social proof | B | Social proof +0.8% reply |

| CTA | Book call | Reply to schedule | A | Direct link better |

| Send Time | Tuesday 10 AM | Thursday 10 AM | B | Thursday +3 point open |

Tools for A/B Testing

At imisofts, we use:

  • Instantly (built-in A/B testing)
  • SmartLead (rotation + analytics)
  • Clay + Apollo (data merge + manual testing)
  • Custom scripts (for complex multivariate tests)

Most platforms now offer native A/B testing. Use it.

Results: What Testing Gets You

Baseline campaign (no testing):

  • Subject: 35% open
  • Reply: 1.5%

After 3 months of weekly testing:

  • Subject: 55% open (+20 points)
  • Opening: better engagement
  • CTA: better conversion
  • Reply: 3.5% (+2%)

That 2% improvement on reply rate is massive. It doubles your results.

What We Recommend at imisofts

We run A/B testing for all managed clients:

  • Weekly testing cycles
  • Subject line, opening, CTA, send time
  • Multivariate testing for scaled campaigns
  • Statistical significance validation
  • Monthly optimization reports

Packages start at $497/month (Management with testing) to $2,450/year (Enterprise with full testing suite).

Explore imisofts Cold Email Packages

Frequently Asked Questions

Minimum 50 per variant. With 50-person samples, a 5-point difference is meaningful. Larger samples (100+) let you detect smaller differences (2-3 points).
Minimum 5-7 days. This accounts for daily/weekly variations in open rates and behaviors. 24-hour tests are unreliable.
Subject line (highest impact), then opening line, then value statement, then CTA, then send time. Don't test low-impact variables first.
Not recommended. Test one variable per week. Testing multiple variables makes it impossible to know which one caused the change.
Use a binomial test calculator. If p-value < 0.05, your result is statistically significant (95% confidence). For cold email rules of thumb: 5%+ difference on 50 samples is likely real.

Ready to build your cold email infrastructure?

See our packages and get started with a system built for deliverability.

View Our Packages