Lead Data Quality Engineer
University of Chicago
The Center for Translational Data Science (CTDS) at the University of Chicago is a research center whose mission is to develop the discipline of translational data science to impactful problems in biology, medicine, healthcare, and the environment. We envision a world in which researchers have ready access to the data needed and the tools required to make data driven discoveries that increase our scientific knowledge and improve the quality of life. We architect ecosystems of large-scale commons of research data, computing resources, applications, tools, and services for the
About the Department
broader research community to use data at scale to pursue scientific inquiry and accelerate discovery. Learn more at https://gdc.cancer.gov/, https://gen3.org/, https://stats.gen3.org/, and https://ctds.uchicago.edu/.
This at-will position is wholly or partially funded by contractual grant funding which is renewed under provisions set by the grantor of the contract. Employment will be contingent upon the continued receipt of these grant funds and satisfactory job performance.
The Lead Data Quality Engineer is a problem solver with an extensive background working in data integrity and testing to ensure high quality data and metadata is distributed to the cancer research community. This is an opportunity to elevate your leadership skills working with one of the world's largest collections of harmonized cancer genomic data. This role focuses on the Genomic Data Commons, which is at the forefront of both cutting edge research and production systems supporting cancer research. Your role will be as the lead engineer for data quality and integrity, joining a team of engineers developing innovative technologies in the pursuit of discovery through data-driven cancer research. You will focus on leading data quality efforts related to data integration, higher level data products, and distribution to the cancer research community, working as a leader across multiple teams to build and automate frameworks such as anomaly detection, reporting, and alerting to ensure data quality. You will be the subject matter expert not only in the data itself, but the systems as well to interrogate the data and understand gaps in data quality. Data and metadata quality has a broad scope, so you are expected to work collaboratively and exemplify leadership across teams to determine priorities and best methods for achieving objectives.
Lead the design of the data QA infrastructure and execution of testing protocols to validate pipelines, integrated datasets, and data products.
Use a combination of exploratory, regression, and automated testing to ensure data quality standards. Assess appropriate inclusion/exclusion of data based on project requirements.
Lead team in evaluation, maintenance, and development of data dictionaries and utilize data specification and code to validate data as it relates to quality.
Lead team in data release planning and implementation based on sponsor and collaborator requirements and data availability.
Proactively identify potential data issues and downstream impact. Identify existing data issues and perform research and root cause analyses to determine resolution. Work collaboratively with software engineers, bioinformaticians, and partners to achieve and verify resolution.
Establish and maintain processes and standards to improve data quality assurance and implement efficiencies in data management.
Define measurements and metrics to conduct and present routine data reports to the project team and partners.
Lead the data acquisition and integration planning efforts including data modeling, data dictionary definitions, and data harmonization pipeline development.
Maintain a deep understanding of multiple genomic datasets and the technical data management software and processes of the underlying system.
Define data quality and integrity criteria and implement a comprehensive data quality management plan to lead key data QC efforts through team collaboration for all phases of the data management life cycle.
Technical Writing - Use knowledge and expertise to create, edit, and enhance system documentation, user documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials. Stay abreast of broad knowledge of existing and emerging technologies and QC tools in the cancer genomics space.
Leads in the development of new systems, features, and tools. Solves complex problems and identifies opportunities for technical improvement and performance optimization. Reviews and tests code to ensure appropriate standards are met.
Utilizes in-depth technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.
Acts as a technical consultant and resource for faculty research, teaching, and/or administrative projects.
Leads or coordinates teams or projects for activities relating to software support and/or development.
Performs other related work as needed.
Education:Minimum requirements include a college or university degree in related field.
Bachelor's degree in Computer Science, Informatics, Bioinformatics, Biological Sciences, or related field.
Master’s or doctoral degree in Computer Science, Informatics, Bioinformatics, Biological Sciences, or related field highly preferred.
Seven (7) years of experience working in data quality and integrity engineering or testing.
Expertise with data modeling, analysis, design, development, testing, and documentation.
Expertise with data quality standards and practices.
Advanced experience writing and executing data-centric test cases to validate data.
Advanced experience writing database queries, reading and understanding database queries, and utilizing other database artifacts.
Advanced experience with biospecimen and clinical data curation.
Up to date experience with advanced high-throughput genomic technologies.
Advanced experience providing bioinformatics services and support.
Advanced experience using NCI datasets (TCGA, TARGET, and CGCI).
Advanced experience using graph and NoSQL databases.
Advanced knowledge of Python.
Advanced knowledge of Linux/Unix systems and basic shell scripting.
Ability to lead across a collaborative team environment and build strong partnerships.
Ability to make a dedicated effort to acquire knowledge in new programming languages, statistical and computational methods, and background in research area.
Ability to prioritize, manage workloads, and lead a team to meet critical project milestones and deadlines.
Oversee efforts to maintain confidentiality related to sensitive matters such as strategic initiatives, trade secrets, quiet periods, and scientific discoveries yet to be put in the public domain.
Ability to take a broad plan and break it into incremental tasks and oversee the completion of each task, delegating when necessary.
Ability to lead teams and projects, working productively with internal and external partners, to ensure accountability for deliverables and outcomes.
Ability to persuade others to adapt new structures or systems to meet objectives, and to be receptive to new ideas and points of view.
Ability to effectively work with management and maintain authority to successfully coordinate the team.
$120,600 - $163,000
Cover Letter (preferred)
When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.
Scheduled Weekly Hours
Drug Test Required
Health Screen Required
Motor Vehicle Record Inquiry Required
The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, national or ethnic origin, age, status as an individual with a disability, military or veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.
Staff Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.
We seek a diverse pool of applicants who wish to join an academic community that places the highest value on rigorous inquiry and encourages a diversity of perspectives, experiences, groups of individuals, and ideas to inform and stimulate intellectual challenge, engagement, and exchange.
All offers of employment are contingent upon a background check that includes a review of conviction history. A conviction does not automatically preclude University employment. Rather, the University considers conviction information on a case-by-case basis and assesses the nature of the offense, the circumstances surrounding it, the proximity in time of the conviction, and its relevance to the position.
The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at: http://securityreport.uchicago.edu. Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.