We are seeking a senior Cloud Resilience & Disaster Recovery Engineer to fortify and automate the recovery capabilities of a large-scale government cloud environment. This role is dedicated to ensuring business continuity through sophisticated multi-region architectures and automated recovery workflows across AWS and Azure.
MANDATORY REQUIREMENT: Security Clearance
An active AGSVA Baseline, NV1, or NV2 security clearance is MANDATORY. Due to the critical nature of disaster recovery and data integrity for this government client, we cannot process applications without a verifiable Australian Government clearance.
The Role
As a Resilience Specialist, you will move beyond traditional backups to build a "self-healing" and rapidly recoverable infrastructure. You will be responsible for the end-to-end automation of recovery processes, ensuring that catastrophic events have minimal impact on essential public services.
Key Responsibilities:
- DR Automation & Strategy: Design and manage automated infrastructure recovery patterns, multi-region designs, and failover orchestration to ensure seamless business continuity.
- Infrastructure as Code (IaC): Lead the transition of legacy systems into fully managed, version-controlled Terraform environments.
- Data Integrity & Protection: Implement and audit immutable backup policies and air gapped recovery solutions to protect critical business data against ransomware and system failures.
- Resilience Engineering: Develop and maintain scalable, secure, and highly available (HA) cloud infrastructure, focusing on multi-zone network topologies.
- Runbook & Testing: Create comprehensive DR runbooks, automated test plans, and recovery strategies that can be executed with precision under pressure.
- Governance & Compliance: Operate within highly structured environments, adhering to strict change control and government governance models.
Your Technical Profile
We are looking for a veteran engineer with 5–10+ years of hands-on experience specifically in cloud resilience and disaster recovery.
Technical Essentials:
- Multi-Cloud Recovery: Strong functional understanding of DR services and high-availability configurations in both AWS and Azure.
- Automation Mastery: Proficiency in Terraform or ARM templates to manage resilience and DR configurations programmatically.
- Connectivity & Topologies: Deep experience with multi-region and multi-zone network architecture, including global load balancing and data replication.
- Monitoring & Diagnostics: Familiarity with cloud-native DR monitoring tools to track RPO/RTO metrics and system health.
- Process Knowledge: A solid grasp of ITIL processes (Incident, Change, and Problem Management) in a mission-critical context.
Qualifications & Certifications:
- Highly Regarded: AWS or Azure Engineering/Architecture certifications.
- Industry Context: Proven experience working in government or highly regulated environments with rigorous audit and security requirements.
- Soft Skills: Exceptional analytical skills and the ability to collaborate with cross-functional teams to enhance cloud reliability.
At Randstad Digital, we are passionate about providing equal employment opportunities and embracing diversity to the benefit of all. We actively encourage applications from any background.
...