This is the data pipeline I built to combine third party data, consumer touchpoints and media data together to prepare for consumer analysis.



Data Pipeline Overview:

1. Data Ingestion Layer

Tooling: AWS Glue / Apache Airflow / dbt to orchestrate ingestion from APIs or S3 buckets.


2.Data Standardization & ID Matching

  • ID Resolution (Identity Graph via LiveRamp):

    • Map user touchpoints to internal user ID or hashed emails via LiveRamp’s IdentityLink.

    • Handle cross-device mapping.

  • Normalization:

    • Standardize timestamps, campaign names, and metrics.

    • Align taxonomy across platforms (e.g., "Facebook" = "Meta", unify country or segment tags).

Key Tables Created:

  • user_metadata

  • liveramp_touchpoints

  • media_campaigns


3. Data Integration Layer (Unified Model)

Create a central fact table for analysis:

FACT_SUBSCRIPTION_TOUCHPOINTS

- user_id

- subscription_status

- subscription_start_date

- churn_date

- campaign_id

- media_channel

- exposure_time

- impression_type (view/click)

- creative_id

- device_type

Joining Strategy:

  • Join liveramp_touchpoints with user_metadata via hashed ID

  • Join touchpoints with media_campaigns via campaign_id and creative_id


4. Attribution & Analysis Layer

Use modeling and rules-based methods to determine how exposures influence subscriptions:

  • Attribution Models: Shapley, Markov, Logistic Regression, or Time-Decay

  • Subscription Funnel Analysis: from impression → visit → sign-up → churn/renew

  • Media Mix Modeling or Incrementality Testing layered on top

Tools:

  • Python (SHAP, scikit-learn)

  • SQL-based dashboards (Mode, Looker, Tableau)

  • ML Pipelines (SageMaker, Vertex AI)


5. Data Output & Reporting

  • Dashboards: visualize ROI by channel, LTV by cohort, subscription uplift

  • Data Export: model scores pushed back to media platforms (for retargeting or suppression)

  • Reports for marketing, product, and finance teams


Sample Use Cases Enabled

  • Attribution of subscription to media touchpoints

  • Churn risk analysis based on exposure paths

  • LTV prediction by channel/creative

  • Incremental impact of specific campaigns