SUMMARY:

  • Staff-level engineer with deep expertise in cloud infrastructure, Kubernetes, and platform reliability, driving stability and scalability across high-growth environments
  • Proven track record in DevOps, monitoring, observability, and incident response, with direct ownership of P0/P1 operational management and secure systems architecture
  • Strong experience leading cross-functional platform and SRE teams, mentoring engineers into senior roles, and scaling organizations through hiring and process improvements
  • Skilled in consolidating disjointed systems into modern, best-practice architectures that reduce operational complexity and improve reliability
  • Adept at vendor management, compliance (SOC 2, PCI, HIPAA), and security best practices, ensuring infrastructure meets both regulatory and business needs
  • Effective communicator and collaborator across engineering, product, operations, and executive leadership, representing engineering in strategic initiatives and acquisition due diligence

EXPERIENCE:

Staff Software Engineer, Platform

Atelio (by FIS) (via acq. of Bond Financial Technologies), USA (Remote)

06/2021 - 02/2025

Responsibilities:

  • Responsible for the operation and maintenance of our cloud infrastructure, reliability, observability, and monitoring
  • Oversaw and managed (player/coach methodology) the ongoing maintenance of Kubernetes, terraform, and cloud resources
  • Owned organization-wide incident response plans and procedures. Primarily escalation point for all P1/P0 incidents
  • Provide mentorship and guidance to junior and mid-level engineers for effective management of projects with appropriate prioritization and communication
  • Conduct research and comparative analysis of potential software vendors, and making build vs buy decisions
  • Manage relationships and contracts with external software vendors (AWS, Fastly, Datadog, VGS, StrongDM, Vanta)
  • Represented the Engineering organization at organization-wide leadership meetings during the post-acqusition period
  • Cross-functional collaboration with product teams to deprecate redundent systems and duplicated functionality to reduce operational complexity.
  • Reconciled existing infrastructure and tooling into appropriate Terraform projects
  • Assisted and Supplemented Product Engineering with new feature development based on priorities and required timelines

Key Accomplishments:

  • Architected and oversaw the consolidation of disjoint application microservices into a modern and best-practice application, reducing data workflow failures by ~95%
  • Grew Platform organization to 4 full time engineers + 2 additional sub-teams (IT Management and Techincal Escalation Engineering)
  • Spearheaded cross-functional initiatives with Product Engineering, Operations, and Leadership to bootstrap new organizations for IT Management and Technical Escalation Engineering responsibilities
  • Coordinated and managed the hiring process of 12 Senior+ Engineers to various product-engineer teams between February and April 2025
  • Coordinated with internal and external security contacts on acquisition, management, and maintainance of SOC2 Type 2 certification
  • Architected project plan, and provided ongoing implementation guidance on, an updated secure data encryption and storage across all application and data storage systems
  • Introduced and educated the engineering team on the use of Datadog for observability and monitoring
  • Coordinated with stakeholders across multiple organizations in the effort to transition all employees and systems to FIS-managed hardware and requirements
  • Delivered technical overview presentations and Q&A sessions during due diligence of acqusition process

Senior Software Engineer - Site Reliability Engineering

Fullstory, Austin, TX, USA (Remote)

02/2019 - 06/2021

Responsibilities:

  • Responsible for the maintainance and functionality of internally-build depployment orchestration system
  • Managed production and pre-production Kubernetes environments
  • Managed day-to-day operational issues and scaling of our internal Prometheus-based monitoring systems

Key Accomplishments:

  • Migrated node scheduling from job-based to attribute-based model, improving the utilization of compute resources and reducing scheduling delays
  • Evangelized the introduction of a Service Mesh across the engineering organization

Senior Software Engineer - Site Reliability Engineering

Yonder (formally New Knowledge), Austin, Texas, USA

02/2019 - 02-2020

Responsibilities:

  • Owned the prioritization and execution of all Devops, Infrastructure, and Site Reliability requirements
  • Maintained multiple Kubernetes clusters for both production and staging workloads
  • Worked with individual Product Engineering leads to reduce operational complexity and streamline our engineering process
  • Actively worked to reduce existing overengineered solutions and improve engineering productivity

Key Accomplishments:

  • Executed a Cloud Migration strategy to migrate all live stateless and stateful workloads from Azure to AWS without downtime
  • Worked with product engineering to standardize and restructure web scraping infrastructure to improve engineering velocity and reduce operational overhead by ~95%

Senior Software Engineer - Infrastructure

Pixlee, Austin, Texas, USA

11/2018 - 02/2019

Responsibilities:

  • Updated the development workflow of core applications to include modern and professional software engineering practices
  • Designed and developed reproducable and automated developer environments based in a Kubernetes environment
  • Identified and communicated fundamental issues in the existing configuration management, and developed a safe migration plan to correct the issues
  • Identified and communicated issues in the current production infrastructure which negatively impact system cost, reliabilty, and operational insight
  • Delivered a safe, long term plan to migrate to Kubernetes in order to reduce the infrastructure bloat, consolidate services, improve reliability, and ease operational burden

Staff Software Engineer

Cratejoy, Austin, Texas, USA

01/2018 - 10/2018

Responsibilities:

  • Managed our production Kubernetes infrastructure, staging environments, and CI/CD pipelines
  • Interfaced with individual product teams in order to plan for upcoming deployment, monitoring, and tooling needs
  • Migrated our central application deployments to team-specific automated deployments
  • Developed internal services to aid in the ease of development of user facing products

Key Accomplishments:

  • Architected and managed the development of the Cratejoy Custom Domain SSL feature (with Lets Encrypt)

Senior Software Engineer

Cratejoy, Austin, Texas, USA

02/2015 - 01/2018

Responsibilities:

  • Formed and led our Site Reliability Engineering team in order to prioritize stability, reliability, performance, and ease of development
  • Identify, investigate, and resolve platform-wide performance and reliability issues
  • Developed and released a reliable internal Traffic Analysis system (with full grainularity), used throughout the company to make business critical decisions
  • Developed and maintained features for the Merchant Tools section of the Cratejoy Platform

Key Accomplishments:

  • Developed internal support for, and implemented, an Engineering On-Call rotation and emergency response playbook
  • Led a migration of our internal infrastructure from Ansible managed machines to Kubernetes
  • Designed, implemented, and rolled out PayPal support for all storefronts, which is used by 1000+ Merchants, and accounts for ~15% of platform purchase volume
  • Received multiple internal awards, including company wide 'Impact of the Quarter' (Q4 2016) and 'Engineering Values' (Q1 2017)

EDUCATION:

Bachelor of Computer Science

Honours Computer Science Co-op, Psychology Minor

University of Waterloo, Waterloo, Ontario

Keywords

This section exists for ATSs. If you are a human, you can ignore this section.

Kubernetes · CNCF · Cloud Architecture · EKS · AWS · GCP · Terraform · Terraform Cloud · IaC · genAI · Generative AI · Cloudflare · Fastly · CDN · API Gateway · OpenAI · Anthropic · Mistral · Ollama · Git · Github · CI/CD · Github Actions · GHA · CircleCI · Buildkite · ArgoCD · Python · Flask · Django · FastAPI · Docker · containerd · Container Runtimes · gRPC · protobuf · SQL · PostgreSQL · pgSQL · RDS · Clickhouse · monitoring · observability · SLO · SLA · Datadog · Sentry · Prometheus · Grafana · Loki · OpenTelemetry · Hashicorp Vault · AWS KMS · AWS SSM · service mesh · istio · linkerd · tokenization · VGS · Skyflow · encryption · security · SOC2 · PCI · MongoDB · Apache Kafka · Redpanda · Event Driven Architecture · Node.js · Express.js · Golang · React · Vercel · Netlify · Nginx · Envoy · Keycloak · Auth0 · Clerk · IAM · OAuth · OIDC ·