Behavioral Flashcards

1
Q

Describe a data engineering problem you worked on. What were some challenges that you faced?

A

One of the biggest challenges I faced was deduplicating data when users would upload a CSV of their airbnb financial data and we would try and ingest it into MySQL and later AWS while I was working at BooksBnB.

Situation - the biggest issue was that CSV’s from airbnb did not have a primary key for each reservation. The problem was lets say someone put in their reservations from Jan-March. If they came back in April and put another CSV in from Feb to April we needed to write the new data but not the old. And we could not just run it for everything above the last updated reservation to the current date. Because reservations change with refunds and the user could be adding a new property. Furthermore it was even more difficult because if the user changed their property name lets say from the downtown cottage to the central cottage.

Task - We had to recognize that although it was a different property name the reservations were the same so not to add them.

Action - to do this while we were on MySQL I created a workflow for the user where if CSV_upload_count > 1 and property != user_properties then they would be asked if this property was new. If they clicked no then they would be shown a list of their user_properties and asked to match it up. This worked to denote a new property from them just changing the name. Still an issue though, user error. So if they went through the workload wrong we had a button where they could erase their last upload. Which worked well. Anyway with that property info we were then able to match the guest name, property, check in date and payment. If they all matched we knew it was a reservation to skip over.

Result - We were able to solve the deduplicaiton problem using this property workflow but we could only get CSV uplaods because sometimes new info we had never seen would pop up like if a guest had an extra payment or there was a dispute on a reservation. So we were able to get CSV uplaods to work 95% of the time but 1 out of 20 would break the pipelines i had to ingest it.

Once we migrated we switched things up a bit. Postgres has Table inheritance which made it easy for us to inititally create a new table with the users new CSV and then check it against existiing data instead of doing bth simultaneously. This increased uplaod speed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Talk about a time you noticed a discrepancy in company data or an inefficiency in data processing. What did you do?
A

Situation: For the CSV upload portion of BooksBnB it would take us. And im embarassed to admit. Over 5000 milisseconds to upload and process the data in MySQL. This is because of Fragementation that we experience.

Task: Speed up the uploads of CSV information

Action: This was a big advantage when I was pitching the company on moving to AWS. With AWS I could split the BI side and platform databases. So once we did move to AWS all the user side exists in PostgresSQL while the BI side of things gets funneled right into our S3 data lakes using Glue crawlers.

Result: When I was done with this upload speed went to under 1000ms which is 5X faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You’re asked to develop a new product. Where would you begin?

A

I would start by talking to my colleauges and trying to understand their role. Namely, how could their access to new or existing data make their jobs easier or more efficient.

That’s why I love data engineering because it is multidisciplinary. You need to have the technical skills programming, database design and so forth but you also need to be able to communicate with people and understand the position they are in in order to take intitiatve and get them the data they need.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why did you leave the Tallit? What is going on at BooksBnB?

A

The Tallit was a 10 month contract. So the contract ended and I had already setup the pipelines they needed.

At BooksBnB our biggest issue was that we though Airbnb’s API would open up so that we could connect directly to it. But its still closed. Without that API its hard to make users pay. With the API we could connect directly to the data and there would be no user work. Now we are trying to work with guestly or hostly to become an extension on their platform that woudl run accounting for their users. We saw that the BETA worked and we got users but that would be the future. But to be honest Im looking to join a bigger team.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tell me about a time you exceeded expectations on a project. What did you do, and how did you accomplish it?

A

Situaiton: Once our queries on MySQL were taking over 5 seconds I knew that we needed to migrate to AWS for EC2 and also automatic scalability with our database. So to do this I logged all of the slow queries on our site and compared it to customer churn. Then I met with our executives and there was a clear correlation between users affected by slow queries and activity on the platform.

TASK: I was tasked with keeping our production databse runnign but to migrate everything to AWS and was given a month to figure it out.

ACTION: I figured out with AWS DMS we could move the data while still having the production database running until we were completely switched over. To do this I setup a clone s3 bucket of our write queries so that we could caputre everything seperately during the transfer. Then I setup the instance and connected everything to switch over. THe biggest issue I had was breaking and then reformatting the foreign keys for the tables. This is because when trasnfering I had to store the connections in a seperate metadata table and hten recall them after the transfer.

RESULT: The actual transfer only took about 5 minutes and took me a week to test and debug. Most of this time was manually goign through and updating our queries since Aurora MySQL does not supprt some of the syntax that MySQL does and vice versa. Once I was able to get 100,000 rows to trasnfer seemlessly I knew we were ready. so I saved 3 weeks on the given time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tell me about a time you did poorly on a project. How did you handle it?

A

Situation: As the Tallit it was a double sided marketplace. So we needed to match the vendors we had with the customers signing up. Let’s say for example that you were a florist that served a particular size party. Well we wanted to make sure that you were matched up correctly to people looking for florists like you.

Task: Send customers daily emails on vendors that might match their proprieties

Action: I set this pipeline up which is rather simple. It takes inputs from a user workflow and matched them to vendors and then puts that all in an email.

Result: I did not think it through and totally debug it. What I didn’t think about is that they was no catch for vendors who could automatically sign up with a free profile. No quality catch that is. So lets say your a terrbiel florist with no website. Well if you signed up for the platform you would be in the daily emails to customers. Whcih is bad because you might be a bad fit for them and then the user loses trust in the platform. So what happenedwe sent unqualified vendors to customers.

To fix this I connected with google business API to only let vendors go into automated emails if they had above a 4 star on google. And htis worked really well. The lesson here I think is to always think through edge cases before you ship a product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe a time you had to explain a complex subject to a non-technical person.

A

When we migrated to AWS I had to explain to our CFO why we needed the budget for it and MySQL would not cut it anymore.

I always feel that as a data engineer your job is to get data to where it needs to go in the organization. To do that you have to understand the person’s role who the data is going to. What data would help them day to day? What KKPI’s are they trying to hit and what data could you get them to help that?

For our CFO I knew that talking about runtime was not going to be a compelling argument. But how that slow runtime actually affected our bottom line would.

I made a log that compared the slow runtime of our ingestion to users bouncing off of the site. It turned out that users who experienced above a 2 second wait for their CSV to upload were 5X more likely not to come back to the platform that week. Once he saw that the slow speeds were losing users he was totally onboard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. How would you describe your communication style?
A

I think it’s important as a technical person to be assertive with areas of a task or sprint that you think may be an issue for you to complete on time.

Whenever we have sprint meetings I make sure to speak up if I feel that I wont be able to hit the deadline provided. I also try and provide a plan so that I can receive the extra resources or help I need to make sure we are on time with our deliverables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tell me a time when your colleagues disagreed with your approach. What did you do to address their concerns?

A

Thankfully I have not experienced that much conflict with coworkers I generally get along well with everyone.

But there was an issue with a colleague after they recently got hired that they did not want to be on sprint cycles at all. So during our sprint meetings every two weeks they would not pay attention and we were having trouble organizing their deliverables which in turn led to a cascading effect and was really hurting our team.

I feel that I am good at resolving conflict so I reached out to them on a call and asked openly why they thought the two week cycle and our current project management wasn’t working. During our conversation and after hearing them out it turned out that they thought sprint cycles were a way to micromanage developers and they had had a bad experience at their previous company with this. After listening I explained my point of view and asked if we could bring in our project manager who was great. So then the 3 of us had a meeting and we were able to hash everythng out which really was just a misunderstanding. I have know worked with that coworker really closely for 3 years and we work together really well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is your greatest weakness?

A

I think the biggest issue facing really all developers and data engineers is that things change so quickly that you are always falling behind unless you are keeping up with the latest technologies.

So to combat this I make sure that I am reading and using new technologies when I become interested in them. I think if you are not doing this then you can easily fall behind in this industry because the landscape can completely change in 6 months unless you are staying on top of things.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is your greatest strength?

A

I have never been afraid to take on new challenges that I know nothing about. I am a completely self taught data engineer and coder for that matter. So I am used to having gaps in my knowledge and working on my own to fix them.

Most recently I had barely used AWS before we migrated. But I knew that if I took the time to be patient and work through problems I would soon have some kind of mastery of it.

I think this is so important for data engineering because most likely you are going to have gaps in your knowledge that you cant be afraid to tackle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What was a time you received feedback on your work you didn’t like and how did you handle it?

A

So for the most part I try and take in as much stakeholder information and talk through expectations as much as I can before starting on a project so this does not happen.

But of course this has happened one that ops to mind is when we were designing the schema for our PostgresSQL DB that handles the platform I designed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Learning new technologies

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly