Ankita Gandhi
Experienced SRE with over 5 years of expertise in enhancing system reliability and efficiency. Specialized in
implementing traffic routing systems, infrastructure migration, SLOs, and SRE best practices.
Email: ankitagandhi6694@gmail.com
Technologies: Python, Ansible, Shell Script, Prometheus, Grafana Kafka, Vault, Kubernetes, Nginx, Amazon Web Services, Git
Certifications: Certified Kubernetes Administrator, VMware Certified Associate 6- Data Center Virtualization
Work Experience
June 2021 – February 2024
Goldman Sachs Group, Inc., Site Reliability Engineer, Menlo Park, California
Managed services supporting Apple Credit Card to achieve 99.99 SLO.
Streamlined canary deployment for targeted traffic routing.
Led business continuity planning and testing for a successful high-yield savings account launch.
Served as the designated SRE expert responsible for designing traffic routing strategies during the migration from VMware’s Pivotal Cloud Foundry to AWS infrastructure
Designed and implemented a Grafana dashboard that consolidates a comprehensive view of multi-region and multi-platform systems for improved monitoring and analysis.
Mentored interns throughout their tenure at the firm making sure they are well equipped to succeed.
Managed a new hire in the team, assisted in their on-boarding and navigating the firm culture and tools.
Participate in on-call rotation to troubleshoot incidents, ensuring continuous communication and collaboration with partner teams.
Collaborate with release management team during on-call to facilitate smooth launch of new release versions.
Authored postmortem reports, actively enhancing incident response and system reliability through action items.
Feb 2018 – May 2021
Apple Inc., Site Reliability Engineer, Cupertino, California.
1. Responsible for maintaining healthy infrastructure which includes Kafka, Spark, Hadoop and Linux servers for over 100 fraud preventing services.
2. Involved in standardization of our data pipeline to reduce time taken to debug failures.
3. Developed automation with Python and Ansible for creation of Kafka topics which reduced operational
load by 80%.
4. Driving team wide efforts to track SLO of our services.
5. Led the project for standardization and cleanup of alerts by communicating with stakeholders which
improved our visibility into applications’ health.
6. Participated in capacity planning and on-boarding of public facing application.
7. Represent SRE team for on-going migration of infrastructure from bare metal to Kubernetes.
8. Involved in training new team members and actively mentor their on-going work.
9. Part of weekly rotating on-call support for maintaining availability and troubleshooting downtime.
May 2017–Aug 2017
Nutanix Inc, Systems Reliability Engineer Intern, Durham, North Carolina.
Troubleshot and solved customers’ technical issues related to Nutanix infrastructure.
Solve customer cases like creating disaster recovery cluster.
Handled various issues regarding licensing, network configuration, software updates and hardware failures.
Recreated customers’ topology in lab to provide effective customer support.
Suggested improvements on existing licensing portal.
Education
Aug 2016–Dec 2017
University of Texas at Arlington, Arlington, Texas, .
Masters in Computer Science GPA: 4/4
Coursework: Secure Programming, Software Testing, Agile Software Engineering, Distributed Systems
Aug 2012–May 2016
Dwarkadas. J. Sanghvi College of Engineering, Mumbai, India.
Bachelor of Computer Engineering GPA: 7.93/10
Coursework: Operating Systems, Cloud Computing, Software Engineering, Parallel and Distributed Computing, Algorithms and Data Structures