BPS Flashcards

(3 cards)

1
Q
  • Led the end-to-end design and implementation of a robust data infrastructure using Airflow and dbt, powering executive dashboards and reporting systems across the organization. Reduced manual reporting cycles by 70%.
A

One of the accomplishments I’m most proud of was completely redesigning our data infrastructure from the ground up.

When I started, the organization was running everything through Windows Task Scheduler on a Virtual Desktop Interface, which was really limiting our capabilities, especially because it didn’t have 24/7 availability and the processing power to keep up with our scale.

The biggest pain point was that when data was not available because of job failures, teams across the organization were spending hours each week pulling data from raw files, manipulating it in Excel, and creating presentation material on the fly .

After understanding this, I created a report on the issue and how it can be solved, and presented my proposal to move to a modern tech stack built around Linux, Airflow, GCP, dbt, and Snowflake.

We had two servers where each had an instance of Airflow, with one orchestrating in productino, and the other in development.

I moved us from using shared drives to GCP buckets for raw data storage.

I implemented dbt for the data transformations and modeling to move us away from a slow implementation of Pandas.

Instead of using on-prem MSSQL, I moved us to Snowflake, and from there I worked closely with different teams to build Power BI dashboards that pulled directly from our models in Snowflake.

The impact was significant - we reduced manual reporting cycles by 70% through faster and more robust ETL processing, and sped up our processes immensely, with transaction jobs improved from 3 hours to 15 minutes.

The transition wasn’t without challenges though. There was definitely a learning curve for some team members, and I spent a lot of time training people and documenting processes to make sure everyone felt comfortable with the new system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • Built and deployed an ML-based Random Forest model using scikit-learn to impute missing retail traffic data, automating historical backfill and daily updates within a scheduled pipeline.
A

We were tracking daily foot traffic across all our stores - basically how many people entered each location each day

The problem was that our traffic counting system wasn’t 100% reliable. We didn’t know exactly why or how data wasn’t captured, and we’d end up with missing data for certain days at certain stores.

This was really problematic because our analytics teams needed complete datasets to do meaningful trend analysis and forecasting.

I decided to build a machine learning model to predict what the traffic should have been on those missing days using a random forest regressor, which estimates values for missing data by being trained on various parts of our existing data, then it averages them out and outputs a value that is used to fill our data gaps.

When retail traffic is set to update, the pipeline would automatically detect missing values, run the model to generate predictions, and flag the imputed data so analysts knew which values were estimates versus actual measurements.

The business impact was that our analytics teams could run reports with confidence, knowing they had complete datasets to work with.

It also freed up time that people were spending manually dealing with these data gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tell me about your experience with Bass Pro Shops

A

at Bass Pro Shops, I work as an ETL Developer focused on building and maintaining the data infrastructure that supports reporting and analytics across the organization.

My main responsibility is managing ETL pipelines that handle around half a million transactional events each day. These come from a mix of sources, including our primary ERP system (JDA), the Canadian SAP system, and flat files from the call center.

In terms of tech stack,

I use Airflow for orchestration, dbt for transformations, and Snowflake as our cloud data warehouse.

Aside from development, I also maintain and monitor the overall health of our data infrastructure like our servers and Airflow, I respond to data quality issues, and support other teams in accessing the data they need for additional adhoc analysis and reporting.

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly