comtech onsite Flashcards
(16 cards)
How did you handle error handling with your metal binder jetting project?
Error handling was done through code. We had data that was batched and pulled to an edge node over Ethernet—basically an Ubuntu server that acted as the staging layer.
The edge node received the batches via a simple REST API (Flask) I set up.
For example, if a Pi re-sent buffered data, the code running on the edge node would detect and skip duplicates using a hash of timestamp + sensor ID + value
How do you, as a Software Engineer, typically approach error handling and defensive coding?
My approach to error handling is to ensure that can handle them gracefully, and ensure that the debugging and maintanence work can be as easy as possible.
The first thing I consider before any error happens, and that is making my code modular and easy to change. For example, if there is an error that occurs with how we transform data before loading, I should be able to make changes there and not anywhere else.
Second, when an error occurs, I ensure that the process can continue if it is still possible. This can be done through retrying or adaptive processing.
Lastly, documentation and visibility is critical. It is important to be able to understand the code, and to get visibility into how the code executes through logging at various levels: DEBUG, INFO, WARNING, etc…
This is what will help us identify how the error happens and how it can be replicated.
When it came to printer data, Did you ever face situations where sensor readings were erratic? How did you handle outliers or calibration?
Luckily, we didn’t run into erratic data.
However, if that was an issue, we could have implemented a primitive identification method through SQL code.
If a value deviates from the previous by more than a given threshold, a TRUE value is filled into a deviation column in the final table. This allows us to at least identify where deviations started when we look at the time series.
When it came to printer data, What kind of queries or KPIs did researchers rely on most from this data?
Our team primarily used the telemetry to correlate environmental stability with print quality.
For example, we used the time series data to calculate average and peak internal temperature, humidity fluctuation, and airflow consistency across time intervals.
These calculations would be used to join with the data that we produced from our test rigs, and that’s how we would try to find correlations.
How did you handle Modbus register mapping and decoding?
The PLC exposed specific holding registers for state codes.
On the edge node, my Python script used pymodbus.client
to poll those registers, and decode the 16 bit values into integers.
A comparison can then be created with state and timestamp provided by the edge node.
Tell me about a project where you worked with Python in a performance-critical system. How did you ensure efficiency and stability?
At Bass Pro Shops, I worked on a project that involved processing high-volume transactional data.
The core challenge was that our transaction historization system—used to track and update sales records—was taking over 3 hours to run and often failed under peak loads, directly impacting reporting and business decisions.
I tackled this by first documenting the entire process to understand the process, and using a context manager to identify performance bottlenecks through execution time.
I found that row comparisons were being done inefficiently, and the data transformation library in use was memory-heavy and poorly suited to scale.
To address this, I rewrote the historization logic using hashing to enable fast, one-to-one row comparisons.
I introduced partitioning strategies to divide the data by key dimensions, which drastically reduced the size of the working dataset in each operation.
I also replaced the legacy transformation library and used with a modern Python library that reduced memory usage and improved compute efficiency.
As a result, I brought down processing time from over 3 hours to just 15 minutes—consistently.
Explain Ethernet and Modbus
Ethernet is the infrastructure, the physical and network connection between the PLC and the edge node..
Modbus is the protocol used for communication, how data is structured and interpreted.
It is the protocol that runs on top of Ethernet. It defines how the PLC and the edge node structure their messages to read from or write to registers, coils, or memory locations in the PLC.
- Have you worked with any IoT protocols like MQTT or Modbus? If not, how comfortable would you be picking these up quickly?
Yes, I’ve worked directly with Modbus during my time at the National Research Council.
On our thermal test bench project, we used a Siemens PLC to monitor accurate timing of our cycles when operating our thermal test bench.
While I haven’t used MQTT specifically, I’m confident in my ability to learn it quickly, just as I have with Modbus, Ethernet, and data engineering.
You mentioned .NET desktop development. Do you have any experience with modern front-end frameworks like React or Vue?
While my core experience with UI development has been in .NET—particularly using WinUI for building a desktop training application at Public Safety Canada—I’ve also explored modern front-end frameworks like React.
I haven’t used React in a production environment, but I know of its component-based architecture, and how it fits into full-stack workflows.
I’m comfortable picking up new frameworks quickly and already have experience integrating front ends with back-end systems.
Given my track record adapting to technical stacks, I’m confident I could ramp up on React or Vue and contribute meaningfully to any front-end effort.
What is MQTT?
MQTT (Message Queuing Telemetry Transport) is a publish/subscribe messaging protocol.
A publisher sends some kind of data to a topic (sensor_data)
A subscriber will subscribe to the topic,
and an intermediary (broker) will deliver the data.
Describe the process of a CI/CD pipeline
Configure a YAML configuration file for your repo and setup Actions.
When you push code or open a merge request, the pipeline runs automatically using that configuration.
It checks out your code, installs dependencies, and runs your tests.
If all steps succeed, your code is considered ready for deployment.
What is containerization?
Containerization is a technology that packages an application and all its dependencies into a container.
Containers ensure that the application runs the same way regardless of where it is deployed, eliminating issues caused by differences in environments.
With containers, deployments are managed using tools like Kubernetes.
Containers are the running application of the container image.
Can you describe what an embedded system is, and how development in that environment differs from general-purpose software?
An embedded system is a specialized computer system designed to perform a dedicated function or set of functions, often as part of a larger device.
Embedded system development is more hardware-focused, resource-constrained, and often requires real-time operation, making it quite different from general-purpose software development.
Given a Linux-based embedded system with no GUI, how would you go about debugging a service that’s crashing during startup?
Check System Logs and check the stack trace, if possible use a debugger to run through execution line by line.
Make sure the system isn’t running out of memory, or exhausting other resources.
Have you worked with time-series databases like InfluxDB? If not, how would you model temperature sensor data arriving every second?
I haven’t worked directly with InfluxDB, but I’ve designed systems that handled high-frequency time-series data.
At NRC, I built a data acquisition system for a metal 3D printer where sensors captured temperature, humidity, and airflow every few seconds.
While we used PostgreSQL for storage, I designed the schema to store each sensor reading with a timestamp, sensor ID, and value, and indexing on time and sensor type to support downsampling and analytics.
If I were modeling **1-second interval temperature data **in a time-series database, I’d process it as a tuple of (timestamp, sensor_id, temperature)
, where **each tuple is that moment in time alongside its values. This should also be faster and simpler in theory. **
If we asked you to deploy a Pi-based telemetry solution tomorrow, what’s your checklist before deployment?
If I had to deploy a Pi-based telemetry solution tomorrow, my checklist would include:
- Sensor & Interface Validation – Confirm sensor specs, wiring, and protocol (e.g. Modbus, I2C, SPI), and validate communication via test scripts.
-
OS & Dependencies – Set up a lightweight, hardened Linux image (like Raspberry Pi OS Lite), install necessary libraries (e.g.
pymodbus
,psutil
), and disable unnecessary services. - Data Pipeline Setup – Configure telemetry polling intervals, buffering logic, and fallback mechanisms in case of connectivity loss.
- Networking & Security – Assign static IP or configure DHCP reservation, secure SSH, set up firewall rules, and test connectivity to the upstream node.
- Storage & Logging – Ensure local buffering for temporary outages, implement log rotation, and monitor disk health.
- Monitoring & Alerts – Include heartbeat reporting, resource monitoring, and push alerts (e.g. via Teams webhook or email) on failure.
- Final Field Test – Simulate end-to-end flow from sensor read to data reception and dashboard update before physical deployment.