After the latest project, EMnify became a 99% only cloud company. To meet growing scalability and reliability requirements of the interconnection between our AWS-based deployments and multiple carriers, BGP peerings had to be moved out of AWS. Therefore, a pair of Juniper routers were put into place. For a company fully relying on cloud services so far, this alien technology resulted in several challenges.
We want to share, how we solved the integration puzzle of this physical equipment into our existing workflows and tools. The use of CI/CD systems for applying changes, AWS CloudWatch, Prometheus and Grafana for monitoring as well as the reluctance to run applications that require a lot of shepherding lead our research to find the right glue - the glue between these pieces of iron and our cloud infrastructure.
Being used to CI/CD processes backed by automated tests, we wanted to adapt these practices here as well. As a result, configuration changes are rolled out by an automated pipeline using Ansible. Efforts for automated testing were made, where we failed. We explain why and what we did instead as well as what we envision for the future.
As every other part of our system, we want its monitoring data accessible via Grafana.
With the help of pmacct and fluentbit, we can treat IPFIX flow records as they were logs. With the help of jtimon, Prometheus stores the routers’ metrics as we are used to do, in doubt tickled out through few custom YANG models.
In summary, the integration worked very well, while we still have several learnings and pain points to share.