DTC Healthcare Marketing Data Analysis and Advanced Attribution Modeling

Status

Done

Published on

09/01/2025

Overview

This project explores how a direct-to-consumer healthcare brand can evaluate multi-channel marketing performance and decide where to allocate budget. Healthcare marketing presents unique challenges such as complex user journeys, higher compliance standards, and the need for measurable ROI.

💡

The goal of this analysis is to measure how different channels contribute to conversions and revenue, and to demonstrate how attribution models can dramatically shift strategic decisions.

Recommendations

EXPLORATORY DATA ANALYSIS SNAPSHOT

‣

Detailed Report

Objectives

Build a realistic synthetic dataset (~10k sessions over 3 months) for a direct-to-consumer (DTC) healthcare technology company.
Analyze campaign performance across Paid Social, Paid Search, Organic, Email, Referral, Direct, and Retargeting.
Compare attribution models (Last-Click, First-Click, Linear, Time-Decay, and later Shapley/Markov)
Develop actionable insights for budget optimization.
Deliver findings through Power BI dashboards and consulting-style slide decks.

Process

1. Defining the Problem

At the start, I outlined core business questions a marketing team would need answered: Which channels are most influential in driving conversions? How should spend be allocated between awareness campaigns and retargeting efforts? From there, I translated the marketing problem into data and analytics requirements. It’s important to track not only whether a user converted, but also the series of touches they experienced along the way. That entails modeling user journeys with enough detail to capture multi-touch sequences and attributing conversion value fairly across them.

Key points:

Business need: reveal true drivers of conversions, not just last-touch channels
Metrics: sessions, conversions, CPA, ROAS, funnel drop-off rates
Conversion events: purchases, Telehealth bookings, newsletter signups

2. Data Simulation & Schema Design

Designed and built a synthetic DTC healthcare marketing dataset using prompt engineering with ChatGPT-5 to simulate realistic GA4 campaign, session and conversion data. The process included defining business requirements, iteratively refining dataset structure through testing, and validating schema in Postgres. This enabled end-to-end marketing attribution and campaign performance analysis in Jupyter and Power BI. To streamline downstream analysis, I created helper views for quick aggregations, ordered journeys, and linking spend to outcomes.

Data Generation / Requirements Synthesis: prompt engineered a mock dataset with realistic distributions, constraints, and IDs covering users, sessions, events, conversions, ad clicks and Semrush keyword metrics.
Data Modeling / Schema Design (DDL): in Postico, designed tables, primary/foreign keys, and normalized entities (campaigns, channels, sessions, events, conversions), and added indexes.
Data Loading (ETL) / Data Quality Checks: loaded CSVs into Postgres and performed validation in Postico including sanity checks, null handling, and quick exploratory queries.

SQL helper views to streamline analysis:

session_channel_map → links sessions to channels
conversion_path → aggregates user-level journeys
campaign_costs → ties ad spend to conversions

3. Exploratory Data Analysis

Conducted exploratory data analysis to understand the baseline dynamics of the dataset. This included profiling the user base by demographics, checking session volume by channel, and visualizing behavioral events across the funnel. This EDA step provided context for interpreting attribution results later. If a channel shows a low last-click conversion rate but was highly represented at the awareness stage of the funnel, that discrepancy would suggest its value might be better captured in a linear or multi-touch model.

users: summarized by demographics (age, gender, device, country)
sessions: analyzed by channel mix, cost distribution, and time-range
conversions: identified types and their average values
events: reviewed event distributions (page views, add-to-cart, checkout)
Produced introductory visualizations in Jupyter Notebook (sessions by channel, conversions by type, funnel drop-offs)

4. Initial Attribution Analysis

With the data staged and explored, I implemented three baseline attribution models to demonstrate how credit allocation shifts:

Last-Click: Assigned all credit to the final channel. As expected, Paid Search and Direct dominated, but this clearly under-represented early-funnel channels.
First-Click: Assigned all credit to the first touchpoint. This highlighted Organic Search and Paid Social as key demand-generating channels but ignored the importance of closers.
Linear Attribution: Spread credit evenly across all touches. This model revealed that Email which is hardly influential in last-click, was actually an important assist channel.

Comparing these three models side by side provided tangible evidence that relying on a single model can mislead strategy.

Attribution Model Process:

Step 1: Validate data quality with last-click and first-click
Step 2: Add in linear and time-decay show how credit shifts.
Step 3: Introduce advanced models like Markov chains or Shapley value if the business case demands deeper accuracy.

DTC Healthcare Marketing Data Analysis and Advanced Attribution Modeling

Overview

Recommendations

Objectives

Process

1. Defining the Problem

2. Data Simulation & Schema Design

3. Exploratory Data Analysis

4. Initial Attribution Analysis

5. Multi-Touch Attribution

6. Analysis Visualization & Dashboard

7. Decision-Maker Slide Decks