Empowering Research - ATAC-seq pipeline development

Photo by Zulian Firmansyah on Unsplash

I am excited to share a recent project I’ve been working on that showcases my skills in pipeline development, project management, and handling ATAC-seq data. Let’s dive into the details!

Unveiling Insights and Driving Collaborative Momentum

The aim of my project was to develop an automated pipeline for exploring chromatin accessibility using ATAC-seq data. This endeavor was meant to not only challenge my technical expertise, but also help drive the Bioinformatics footprint at Ann & Robert H. Lurie Children’s Hospital of Chicago.

As part of the Bioinformatics team, I help support the research enterprise at Lurie Children’s. Our team goal is to develop reproducible bioinformatic analysis pipelines for common analyses with common experimental designs. We strive to offer these automated analysis pipelines as packages for the research community. At the heart of my project lies the prospect of fostering new and engaging collaborative relationships. By offering collaborative services and showcasing the power of our analysis to the research community, we’re set to push the boundaries of research at the institution.

ATAC-seq and Its Significance

ATAC-seq is like a magnifying glass for the genome – It lets us see which regions of DNA are accessible and active. By revealing which parts of our DNA are accessible and active, researchers can pinpoint the genes that play crucial roles in health and disease.

For example, consider the pursuit to understand the genetic drivers behind conditions like cancer. ATAC-seq provides a window into the DNA regions associated with abnormal accessibility patterns, shedding light on the molecular origins of diseases. This knowledge can steer the development of targeted therapies and interventions, ultimately improving patient outcomes.

The ATAC-seq Workflow

For bioinformatic scientists, handling ATAC-seq data involves a sequence of steps beginning with data pre-processing. Once the data is prepared, it is aligned to a reference genome. With alignment complete, the next step involves identifying “peaks”. These peaks are specific regions of the genome where chromatin accessibility patterns undergo significant changes. Essentially, they mark the spots where DNA becomes more accessible, revealing key areas of genetic activity. These peaks hold immense significance. They indicate regions where regulatory elements, such as promoters and enhancers, become more open and accessible. These elements play a crucial role in controlling gene expression – the process by which genes are activated to perform various functions. By analyzing these peaks closely, we gain insights into the regulatory landscape of the genome. This understanding helps us uncover how genes interact with their environment, guiding cellular responses and orchestrating various biological processes.

The bioinformatics approach I developed leverages a combination of existing software tools and custom pipelines:

  1. nf-core/atacseq v1.2.2 pipeline: This workflow kicks off with the nf-core/atacseq pipeline version 1.2.2, a well-crafted and community-supported pipeline specifically designed for ATAC-seq analysis. This pipeline encapsulates the best practices for data pre-processing, alignment, quality control, peak calling, and more.

  2. Custom developed pipeline: Building upon the results generated by the nf-core/atacseq pipeline, I developed a specialized pipeline to further process the data and generate a comprehensive report.

By integrating community-driven pipelines and custom-developed processing, this workflow ensures accurate, reproducible, and meaningful analyses.

Conclusion

As a whole, this project underlines my commitment to scientific exploration and innovation, and demonstrates my capacity to spearhead complex endeavors with precision and self-reliance. I look forward to seeing the impact of this pipeline on our research endeavors!

See sample report generated from this pipeline here.

Joy Nyaanga
Joy Nyaanga
Senior Bioinformatician

My interests include genomics, data science, and R.