Data Engineer

6 days ago


Cambridge, Cambridgeshire, United Kingdom Cyted Health Full time £45,000 - £65,000 per year
Job Summary

As a Data Engineer at Cyted, you'll build the data infrastructure that powers our diagnostics and research. You'll transform experimental workflows into reliable, production-grade data pipelines, implementing reproducible ingestion and analysis processes (primarily using Nextflow) and developing automation and orchestration for both operational and research workloads.

You'll establish strong data governance and observability practices, ensuring datasets are versioned, catalogued, and fully traceable from source to output. Security and compliance will be embedded in everything you design, meeting the standards required for regulated healthcare and diagnostics environments.

You'll work closely with computational biologists in R&D and software engineers in the Technology team to translate scientific and product requirements into scalable, maintainable solutions. Alongside delivery, you'll maintain clear technical documentation, contribute to code reviews, and help raise engineering standards across the team.

Working Pattern and Location

The role is a full-time position with a standard 37.5 hour working week. The role holder may be required to work flexibly.

The Data Engineer will be based at Cyted's Head Office, Ground Floor Building 3 Old Swiss, 149 Cherry Hinton Road, Cambridge, United Kingdom, CB1 7BX.

What you will be doing

Pipeline Design and Development

  • Build, maintain, and optimise scalable data ingestion and analysis pipelines using workflow engines such as Nextflow.

  • Translate scientific and analytical prototypes into robust, reproducible, and automated workflows suitable for production use.

  • Create modular, testable components and establish clear versioning to ensure reproducibility across environments.

Data Architecture and Governance

  • Design and maintain data models, storage solutions, and metadata catalogues that support efficient querying and lineage tracking.

  • Implement and enforce data governance practices, including data classification, retention policies, and access control frameworks.

  • Maintain comprehensive lineage tracking (e.g., with OpenLineage or equivalent) and ensure auditability of all datasets.

Automation, Monitoring, and Reliability

  • Develop orchestration and scheduling frameworks to automate both operational and R&D pipelines.

  • Implement observability practices — monitoring, alerting, and automated recovery — to ensure high reliability and performance.

  • Drive continuous improvement in efficiency, scalability, and cost optimisation of data workflows across AWS/GCP/Azure.

Security and Compliance

  • Embed security-by-design principles into all data handling, including encryption, authentication, and secrets management.

  • Ensure all pipelines and data stores comply with regulatory requirements relevant to diagnostics and healthcare (e.g., ISO27001, ISO13485, CLIA/CAP, GDPR).

  • Contribute to technical documentation and evidence for audits and certification processes.

Collaboration and Communication

  • Partner with computational biologists and product engineers to define data requirements and shape infrastructure decisions.

  • Provide technical mentorship and guidance to team members on data engineering best practices.

  • Document systems and processes through runbooks, design specifications, and operational guides.

  • Contribute to code reviews, internal knowledge-sharing sessions, and cross-functional project planning.

Innovation and Continuous Improvement

  • Evaluate and integrate new technologies to improve data processing, observability, and scalability.

  • Identify and remove bottlenecks in the data lifecycle — from ingestion to reporting — to accelerate insight generation.

  • Support the adoption of modern DevOps and MLOps approaches for scientific and product data pipelines.

How we work

At Cyted, how we work is just as important as what we're building. Our values shape how we collaborate, innovate, and deliver for patients and partners. As our Data Engineer, you'll bring these values to life from day one.

We care deeply about data integrity, patient outcomes, and the clinicians who rely on our insights. In this role, care means building systems that are accurate, traceable, and resilient - because real people depend on the results we generate. You'll take pride in clean code, reproducible pipelines, and the knowledge that every dataset you shape contributes to earlier, better diagnosis.

We expect you to own the work and contributions to your functions with confidence and curiosity. You'll be responsible for designing and maintaining the infrastructure that connects our science, operations, and technology. You'll take initiative, move with purpose, and be trusted to make critical decisions that keep our data ecosystem secure, scalable, and compliant.

We aim high. We're scaling fast, working across complex regulated environments, and pushing boundaries in how data accelerates diagnostics. You'll be empowered to build with ambition - optimising workflows, streamlining automation, and helping define what great data engineering looks like in healthcare.

You'll be expected to dive deep into the science, the systems, and the standards. You'll understand the technical and regulatory nuance behind every workflow, and you'll be just as comfortable debugging a Nextflow pipeline as you are explaining architecture decisions to cross-functional teams. You won't just maintain systems, you'll actively improve them.

We encourage everyone to challenge and commit. You'll help shape how we work as a data-led company, questioning assumptions, sharing ideas, and being open to better ways. But once we align, you'll deliver with clarity, ownership, and precision.

And most of all, we deliver. This is a role for someone who thrives on progress, who builds with intent and sees impact in every successful workflow run, every insight delivered, and every patient outcome improved.

This is how we work at Cyted, and if this sounds like the environment where you'll do your best work, we'd love to speak with you.

Person Specification

We're looking for a skilled, proactive Data Engineer who's ready to build and scale the infrastructure that powers our scientific and operational insights. The ideal candidate will bring experience working with complex, regulated datasets, a strong grasp of modern data engineering tools and best practices, and the curiosity to solve problems at the intersection of biology and technology. You'll be hands-on, adaptable, and motivated to design systems that are reliable, compliant, and built to grow in a fast-paced, purpose-driven environment.

To succeed in this role, you'll bring:
  • A degree in Computer Science, Bioinformatics, Computational Biology, or a related field—or equivalent practical experience

  • 2–3 years of industry experience working in a regulated data environment (e.g., biotech, healthtech, or clinical diagnostics)

  • Proven experience designing and maintaining reliable data pipelines on AWS, GCP, or Azure

  • Strong proficiency in Python, with solid Linux/Bash fundamentals

  • Hands-on experience with at least one workflow engine (e.g., Nextflow, Snakemake)

  • Familiarity with version control systems (Git, GitHub) and CI/CD best practices

  • Working knowledge of regulated frameworks (CLIA, CAP, IVD, ISO27001, ISO13485) and audit readiness requirements

  • Understanding of NGS data, associated tools, and standard QC practices

  • Experience with data cataloging and governance platforms (e.g., DataHub), lineage tracking (e.g., OpenLineage), and access control management

  • Knowledge of Infrastructure-as-Code (e.g., Terraform), identity and secrets management (IAM), and cloud cost optimization at scale

  • Exposure to the R programming language and genomics workflows such as RNAseq, single-cell, or structural variant/CNV pipelines

  • A strong focus on testing, monitoring, and observability to ensure data integrity and reliability

  • Clear, concise communication and a collaborative approach to problem-solving

Benefits
  • Salary in the range of £45,000 - £65,000 per annum depending on your skills and experience. 
  • 25 days holiday per holiday year, plus public holidays
  • Pension scheme
  • An annual learning and development budget
  • Medical insurance including dental and optical cover
  • Life/critical illness cover
  • Social events including Christmas and Summer parties
  • Cycle to work scheme
  • Electric Vehicle Scheme
  • Sabbatical 4 years of service

  • Data Engineer

    2 weeks ago


    Cambridge, Cambridgeshire, United Kingdom Mackenzie Jones Full time £40,000 - £80,000 per year

    Data Engineer. Permanent. T6/MN/ Hybrid - 2 Days Onsite Weekly - Cambridgeshire.Must be Eligible to work in the UK.International Manufacturing organisation is seeking to secure a Data Engineer. Member of a small Data Engineering Team which is part of a much larger IT function.Role:Data Movement & Transformation processes between...

  • Data Engineer

    1 week ago


    Cambridge, Cambridgeshire, United Kingdom Axiom Software Solutions Limited Full time £60,000 - £100,000 per year

    Position: Data EngineerLocation: Cambridge / Luton, UK (Hybrid 2-3 days onsite in a week)Duration: Long Term B2B ContractJob Description:The ideal candidate with a minimum of 5 +years of experience having strong experience working with Snowflake, DBT, Python, and AWS to deliver ETL/ELT Pipelines using different resources. • Proficiency in Snowflake data...

  • Data Engineer

    2 weeks ago


    Cambridge, Cambridgeshire, United Kingdom Cyted Health Full time £45,000 - £65,000 per year

    About Us We are a leading gastrointestinal health company delivering minimally invasive diagnostics to transform access to esophageal care. Our EndoSign test combines a simple, swallowable device with cutting-edge laboratory biomarkers and analytics to detect esophageal cancer and its precursor, Barrett's esophagus.Operating across the US and UK...

  • Data Engineer

    4 days ago


    Cambridge, Cambridgeshire, United Kingdom Royal Society of Chemistry Full time £51,400 - £80,000 per year

    CircaSalary - Salary Plan, 51,400.00 GBP AnnualThe Royal Society of Chemistry (RSC) has a great opportunity for a Data Engineer on a 7-month fixed term contract, maternity cover with the potential to be extended. In this role you will be responsible for developing and maintaining the RSC data warehouse. The role is highly technical and hands-on and involves...

  • Staff Data Engineer

    2 weeks ago


    Cambridge, Cambridgeshire, United Kingdom Arm Full time £60,000 - £100,000 per year

    Are you seeking an exciting and meaningful role at the forefront of the technology industry? Are you motivated by learning new things and putting that knowledge into action?We are seeking a highly skilled and motivated data engineer to join our Productivity Engineering group. You will be part of a team based in Cambridge (UK), working closely with...


  • Cambridge, Cambridgeshire, United Kingdom SoCode Recruitment Full time £60,000 - £100,000 per year

    Do you you enjoy working closely with a tight-knit team?Do you want to work in a business where making a difference is at the heart of their goals?I'm supporting a rapidly scaling medical technology innovator in their search for a Senior Data Engineer to help design and build a next-generation unified lakehouse platform on Databricks. This is a fantastic...


  • Cambridge, Cambridgeshire, United Kingdom GSK Full time $136,950 - $228,250

    Nazwa biura: South San Francisco 611 Gateway Blvd, Cambridge 300 Technology SquarePosted Date: Dec 4 2025The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step change in our ability to leverage data, knowledge, and prediction to find new medicines.  We are a full-stack shop...


  • Cambridge, Cambridgeshire, United Kingdom Eclectic Recruitment Ltd Full time £40,000 - £80,000 per year

    Our client, a pioneering Robotics and Technology company, is recruiting for a Robotic Data Engineer on a full-time permanent basis. This is a hybrid role with occasional travel.Key duties will include but are not limited to:Review current data reporting process to optimise this to an automated process.Testing, debugging and verifying C++ code for data...

  • Senior Data Engineer

    2 weeks ago


    Cambridge, Cambridgeshire, United Kingdom KDR Talent Solutions Full time £70,000 - £90,000 per year

    Role:Senior Data Engineer (Databricks / AWS / Lakehouse)Location:Cambridge, UK (Flexible Hybrid Working)Salary:£70,000 - £90,000 basic + Comprehensive Benefits PackageAre you a Data Engineer who wants to build systems thattrulymatter? Are you an expert in Databricks, looking for a challenge beyond just operating an existing platform?I'm hiring for a...


  • Cambridge, Cambridgeshire, United Kingdom Simmons & Simmons Full time £60,000 - £120,000 per year

    The role: We are looking for a Legal & Data Engineer to join our growing team. The role of the Legal & Data Engineer is a commercially-focused, client-facing position supporting and developing services for our clients. The Legal & Data Engineer will work closely with our Senior Legal & Data Engineer to identify, scope, price and implement services that bring...