Data Engineer

2 weeks ago

Oakington, Cambridgeshire, United Kingdom Optimum Patient Care Full time £60,000 - £120,000 per year

About Optimum Patient Care

Optimum Patient Care has been supporting chronic disease management and research in primary care for over 15 years. We are a not-for-profit social enterprise which provides a range of services to over 1000 GP surgeries across England, Scotland, Wales and Northern Ireland. Our services include clinical audit and review services, IG-compliant data extraction services, and research support services. Our services are provided at no cost to participating practices and freely available to all GP surgeries across the UK. Our anonymised research database, OPC Research Database (OPCRD), is one of the largest electronic health record databases in the world for research use, with over 17 million anonymised patients.

Job Description

We are seeking a Data Engineer
with established commercial experience (intermediate level)
to join our growing data infrastructure and research systems team. This role focuses on managing and improving large-scale SQL Server ETL processes that handle GP clinical data for real-world research.
This is a hands-on role for a problem-solver who is autonomous in T-SQL and ETL development and is looking to deepen their expertise.
You'll work with high-volume datasets, evolving data feeds, and complex import processes, while contributing to our evaluation/expansion into PostgreSQL clustered platforms as an additional environment.

Key Responsibilities

Manage and improve automated SSIS-based ETL pipelines importing data from multiple GP clinical systems (~700 imports/month, 150+ large scripts).
Continuously adapt import processes to handle changing file formats and evolving specifications.
Execute data consistency and validation checks across multiple databases.
Develop and run historic fix scripts to identify and correct data issues.
Perform general DBA tasks, including backup management, integrity checks, and performance monitoring.
Confidently handle server-level file operations, including data import/export and directory management.
Optimise partitioned tables and queries for very large datasets (10⁹+ rows, 50+ TB total).
Work closely with research and data science teams to ensure data aligns with OMOP CDM and SNOMED CT standards

Key Requirements

As an intermediate-level engineer, you will already have a strong foundation and be capable of independently managing complex tasks. We are looking for:

Deep, hands-on expertise in SQL Server development,
including advanced T-SQL (stored procedures, functions, complex queries) and experience building/maintaining ETL pipelines,
preferably using SSIS or other recognised technology.
Demonstrable commercial experience (e.g., ~3-6 years)
in a data-focused role (Data Engineering, BI Development, Database Administration) working with
very large databases (VLDBs).
Proven experience managing and querying data in complex sectors where data volume and quality are critical (e.g., clinical/health data, finance, insurance, or large-scale e-commerce) preferable at least two different environments.
A strong command of database optimisation,
including query tuning, effective indexing strategies (e.g., clustered, non-clustered, columnstore), and the
ability to read and interpret query execution plans to diagnose performance issues.
Evidence of managing a mature, script-heavy ETL environment
and adapting it to new requirements.
A systematic approach to problem-solving and
root cause analysis, particularly for complex data quality and pipeline failures.
Comfortable with server-level operations, including Windows Server file/directory management and using utilities for data import/export.
Strong communication skills,
with the ability to collaborate effectively with technical, research, and data science teams.

Desirable Skills

Experience with PostgreSQL or distributed database architectures (e.g. Citus).
Familiarity with OMOP CDM, SNOMED CT, or clinical research data models.
Experience building or customising ETL logic in C# or Java.

Future Projects

Migration of one major research database to a PostgreSQL clustered platform, expanding our data infrastructure rather than replacing SQL Server.
Development of OMOP-compatible data pipelines to strengthen interoperability and support advanced research workflows.

Our Environment

50+ TB of structured research data across multiple SQL Server environments.
Data collected from GP practices nationwide for medical and population health research.
Collaborative technical team with a strong emphasis on data quality, performance, and innovation.
A global research organisation working in partnership with leading pharmaceutical companies and producing high-impact internal research.

Why Join Us

You'll help shape and maintain the data infrastructure underpinning some of the UK's most significant clinical research. This is a hands-on role ideal for a capable data engineer ready to deepen their expertise across large-scale SQL Server systems and emerging PostgreSQL clusters.

Job Types: Full-time, Permanent

Benefits:

Additional leave
Casual dress
Company pension
Cycle to work scheme
Free flu jabs
Paid volunteer time
Sick pay

Americas

Europe

Asia / Oceania

Africa

Data Engineer