pipelinedata — EMILY ZHAO

This is the data pipeline I built to combine third party data, consumer touchpoints and media data together to prepare for consumer analysis.

GIthub: https://github.com/emilyzdata/pipelinedata

Data Pipeline Overview:

Tooling: AWS Glue / Apache Airflow / dbt to orchestrate ingestion from APIs or S3 buckets.

ID Resolution (Identity Graph via LiveRamp):
- Map user touchpoints to internal user ID or hashed emails via LiveRamp’s IdentityLink.
- Handle cross-device mapping.
Normalization:
- Standardize timestamps, campaign names, and metrics.
- Align taxonomy across platforms (e.g., "Facebook" = "Meta", unify country or segment tags).

Key Tables Created:

Create a central fact table for analysis:

FACT_SUBSCRIPTION_TOUCHPOINTS

- user_id

- subscription_status

- subscription_start_date

- churn_date

- campaign_id

- media_channel

- exposure_time

- impression_type (view/click)

- creative_id

- device_type

Joining Strategy:

Use modeling and rules-based methods to determine how exposures influence subscriptions:

Attribution Models: Shapley, Markov, Logistic Regression, or Time-Decay
Subscription Funnel Analysis: from impression → visit → sign-up → churn/renew
Media Mix Modeling or Incrementality Testing layered on top

Tools:

Dashboards: visualize ROI by channel, LTV by cohort, subscription uplift
Data Export: model scores pushed back to media platforms (for retargeting or suppression)
Reports for marketing, product, and finance teams