Platform Architect HPC & AI

Join Us as a Platform Architect — Shape the Future of AI and High-Performance Computing

We’re on the hunt for a visionary and systems-minded Platform Architect to define and drive the architectural strategy behind our next-generation, high-performance, AI-powered computing platform. This is a pivotal role—one that sits at the heart of scaling a secure, fault-tolerant, and cloud-native infrastructure designed to power advanced modeling, AI/ML workflows, and data-intensive workloads across life sciences and other innovation-focused industries.

You’ll be the guiding force behind the technical foundation of our platform, translating complex requirements into scalable, resilient systems. With deep expertise in distributed systems, AI infrastructure, and HPC environments, you’ll align technology architecture with regulatory, operational, and business priorities.

Working closely with cross-functional teams—engineering, product, and compliance—you’ll help shape a platform that’s not only powerful and future-proof but also reliable, supportable, and elegantly designed for real-world use.

Responsibilities:

  • Develop and guide an architectural blueprint that supports modular, scalable, and secure delivery of HPC and AI capabilities.
  • Design for resilience, fault tolerance, and operational durability to ensure platform services are stable and supportable at scale.
  • Translate emerging scientific and business needs into infrastructure strategies that prioritize reliability, usability, and maintainability.
  • Collaborate with engineering, infrastructure, product, and compliance teams to ensure architectural alignment with implementation and operational goals.
  • Lead technical design reviews and act as an advisor on systems-level challenges, promoting clarity and coherence across teams.
  • Foster shared understanding of platform design tradeoffs, emphasizing outcomes that improve the experience of users and those who support the platform.
  • Define infrastructure requirements for reproducible, on-demand, and GxP-compliant compute environments.
  • Ensure that security, observability, and operational control are embedded into platform architecture from the outset.
  • Guide the use of containerization, orchestration, and service mesh technologies (e.g., Kubernetes, Istio, Argo) in collaboration with engineering teams.
  • Architect scalable infrastructure for the full AI/ML lifecycle, including model training, deployment, and real-time inference.
  • Evaluate and integrate emerging HPC and AI technologies (e.g., accelerators, AI agents, distributed frameworks) to enhance long-term platform capability.
  • Define workload orchestration strategies that balance performance, cost-efficiency, and operational resilience.
  • Perform feasibility and sustainability impact assessments for proposed architectures, including risk analysis, cost implications, and long-term maintainability.
  • Represent architectural perspectives in customer engagements and business development efforts where platform design is a key differentiator.
  • Collaborate with stakeholders to scope and shape technical solutions that align with product vision and customer requirements.
  • Identify systemic architectural or operational issues and drive improvements that benefit both internal teams and external users.
  • Please note: that this job description is not meant to be all-inclusive. Other duties may be assigne

Qualifications:

  • 10+ years of experience in software or platform architecture, including 5+ years in HPC, large-scale compute infrastructure, or AI platform development.
  • Strong understanding of cloud-native architecture (AWS, Azure, or GCP), container technologies, and orchestration frameworks.
  • Experience designing infrastructure that is resilient, fault-tolerant, and easy to operate, especially in regulated or high-stakes environments.
  • Background in supporting AI/ML workflows (e.g., TensorFlow, PyTorch) and integrating workflow orchestration tools (e.g., Airflow, Nextflow, Argo Workflows).
  • Familiarity with distributed systems and job scheduling (e.g., Slurm, HTCondor) in both research and production environments.
  • Technical fluency across multiple languages and systems (e.g., Python, Go, R, Linux-based infrastructure).
  • Strong communication and systems-thinking skills with a track record of collaborative problem solving.

Preferred Qualifications:

  • Familiarity with GxP compliance, 21 CFR Part 11, or regulated computing frameworks.
  • Background in scientific computing, pharma R&D, or life sciences infrastructure.
  • Exposure to AI agent orchestration frameworks (e.g., LangChain, NVIDIA NeMo, AutoGen).
  • Experience with semantic data platforms or data lakehouse architecture.

Education and Experience:

  • Bachelor’s degree in computer science, engineering, or a related field – or equivalent work experience with demonstrable expertise in platform-scale architecture.
  • Experience collaborating across disciplines including engineering, infrastructure, networking, and security.
  • Certifications in cloud, security, or systems architecture are preferred

Physical Demands

The job frequently requires working at a computer terminal, standing or sitting, and the ability to operate the computer with proficiency.

Work Environment

The work environment is quiet with no adverse conditions.

Metrum Research Group offers competitive salaries and an excellent benefits package. You can read more about us by clicking the link at the top of this page, ‘Company Website’.

Metrum Research Group is an Equal Opportunity Employer

Metrum Research Group EEO Statement

 

 

 

 

 

MetrumRG believes that innovation is cultivated when we challenge each other with new ideas and perspectives. MetrumRG is an equal opportunity employer that is committed to building a diverse and inclusive team. All employment decisions are based on qualifications, merit, and business needs, and we prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws.

MetrumRG is committed to providing equal employment opportunities and reasonable accommodations for candidates and employees with disabilities. We encourage all qualified candidates to apply for positions within our organization. If you require reasonable accommodation because of a medical condition for the application or interview process, please contact Scotti Rylands or our Talent and Culture Department, (860)735-7043 x-622, or message us and we will work with you to meet your needs.