- Date Posted
- Aug. 30, 2021
- Technical & Customer Support
Gremlin’s mission is to make the Internet more reliable. We’re leading the way in the exciting, growing practice of Chaos Engineering, for enterprises like Target, Twilio and JP Morgan Chase that are building complex, distributed SaaS applications whose success depends on uptime. The Gremlin platform uncovers risks and weaknesses that aren’t addressed by traditional DevOps and IT operations processes and best practices. If paving a new path forward at the leading edge of technology sounds exciting to you, we should talk.
About the Role of DevOps Architect (Post Sales/Customer Facing)
Internally, we refer to this position as Gremlin Reliability Specialists. Gremlin Reliability Specialists will work directly with Gremlin’s top customers to help build and expand the customer’s technical expertise and business value in the practice of Reliability Engineering. Acting as an extension of the customer’s team, you will collaborate with their highly technical engineers and engineering leaders to prove out the value of Reliability Engineering. This will involve supporting the design of experiments specifically targeted for the customer’s application and architecture, and driving a business plan that will deliver more resilient systems. Finally, you will work alongside the customer’s Gremlin account and product teams to bridge the gap and help ensure their growth and successful usage of Gremlin.
Gremlin Reliability Specialists are our customer’s deepest technical experts wielding reliability, chaos engineering, and, devops expertise in order to implement Gremlin’s application into their complex, highly secure, and regulated infrastructure. But that is just the first step, as our specialists take our customers further, enabling them to drive the use of our app, api, and application layer products to enable their use cases and reach their business goals. GRS are an extension of the customer and intimately understand the short and long term use of our product relative to our customers unique infrastructure driven use cases. Together Gremlin’s Reliability Specialists and Customers form a team to create experiments to understand failure points, design scenarios to test integrated systems and resolve reliability issues, and embed automation to keep their systems resilient and the SRE teams in the know of potential failures.
This role requires both a breadth and depth of technical knowledge and experience as well as the ability to establish, lead, and form successful relationships with multiple personas across an organization to be successful.
In this role, you’ll get to:
- Work with an amazing team of other Gremlin Reliability Specialists to advance the practices of Reliability and Chaos engineering in our portfolio of enterprise customers.
- Establish and maintain deep relationships with our customers, helping them increase their maturity and use of the product throughout their organization.
- Integrate Gremlin with existing customer enterprise tools
- Enterprise SSO Authentication Systems (ADFS, Okta, etc…)
- CI/CD Pipelines (Jenkins, Spinnaker, etc…)
- Perform architecture reviews with customers (application and infrastructure perspectives)
- Identify applications or services to target for Chaos Engineering experiments
- Organize, plan, and assist in running GameDays with customers
- Provide training to customers and customer teams, both directly and train-the-trainer
- Generate Executive Reports on Chaos Engineering tests findings and make recommendations on next steps
- Develop, manage, and lead teams for customer organizations.
- Document each customer’s success criteria, then communicate and validate with each customer on an ongoing basis that value is being recognized. Consult with customers on best practices to increase value and ROI, ensuring we’re hitting our renewal and expansion targets.
- Provide a direct line of support for customers through dedicated Office-Hours
- Align customer goals with the Gremlin account team to drive deliverables.
- Be a primary point of escalation contact for targeted customers.
- Work with other internal resources to coordinate/facilitate high level demos, workshops, and training sessions to educate customers on current features based on best practices and provide visibility into current vs. future product features and capabilities.
- Engage with the Gremlin product team customer experiences.
- Through customer interactions, gather critical feedback on our product and relay back to internal Gremlin teams (product, tech support, marketing, sales).
- Create opportunities for customer stories, whitepapers, and blog posts among assigned customers.
- Proactively reach out to customers to drive adoption, identify expansion and add-on opportunities, and have a “good finger on the pulse” for each account.
- Act as an advocate for our customers, and invest the time to develop and enhance relationships with key stakeholders to earn “trusted advisor” status, naturally growing value, revenue, and increasing customer satisfaction.
We’ll expect you to have:
- 5+ Years in n SRE, DevOps, IaaS or SaaS providers, or Software Development
- 5+ years in delivering reports and presenting in meetings with customers at both technical and executive levels
- Strong Linux and Container experience
- Knowledge of Kubernetes and OpenShift container orchestration platforms
- Familiarity with IT management frameworks such as ITIL, COBIT, or eTOM
- Experience with automation frameworks such as Puppet, Chef, or Ansible
- Experience with CI/CD tools such as Jenkins, Spinnaker or Github Actions
- Experience working through a production outage
- Excellent verbal and written communication skills
- Experience with monitoring and observability tools such as Grafana, New Relic, DataDog, CloudWatch
- Familiarity with incident management tools such as PagerDuty, etc...
- Familiarity with project management tools such as Asana, Jira, or Trello
- Familiarity with the modern software development life cycle
- Prior experience in test automation
- AWS, GCP or Azure cloud certifications
- Experience in a previous role supporting the establishment and growth of Chaos Engineering
- Experience using Gremlin for reliability testing
- If you don’t think you meet all of the criteria below but still are interested in the job, please apply. Nobody checks every box—we’re looking for candidates that are particularly strong in a few areas, and have some interest and capabilities in others.
- Competitive compensation
- 401k Match
- Stock Options
- Flexible PTO
- Competitive benefits package, including medical, dental, and vision insurance
- Team Activities (currently virtual due to Covid-19)
Our founders, Kolton Andrus and Matthew Fornaciari, lived and breathed incidents, on-call, and Chaos Engineering at Amazon and Netflix. As “Call Leaders,” they were responsible for guiding teams through analyzing and resolving global outages. After a decade of developing and advocating Chaos Engineering internally, in 2016 they decided to make what they had learned available to a wider set of enterprise companies and launched Gremlin.
Since then, Gremlin has built an incredible team of industry veterans and people eager to learn from one another while pushing the entire industry forward to new heights. We’re backed by top-tier investors Index Ventures, Amplify Partners, and Redpoint Ventures. Our customers love us, and we’re thrilled to be a partner in their success.
At Gremlin, we value:
- OUR CUSTOMERS - We won’t be a company if our customers aren’t thrilled. We live and die by our customers, so they come first.
- ACTION - We favor small experiments to gather data rather than over-analyzing a situation. Getting stuff done always beats talking about getting stuff done.
- CONTEXT, NOT CONTROL - We hire autonomous adults with good judgement. We provide them with the context to make smart decisions. We don’t micromanage.
- BEING VOCALLY SELF-CRITICAL - We all make mistakes, we all have ways in which we can improve. We own that upfront, and honestly discuss ways in which we’ve personally made mistakes and can get better. Then, we encourage and help one another succeed at doing so.
- DIVERSITY, EQUITY, & INCLUSION - We are at our best when we encourage and include the thoughts and voices of people from many diverse backgrounds into our strategy and execution. We recognize that systemic racism and gender bias are real and that we aren’t perfect, so we actively work to encourage the difficult conversations, to listen, and to change as we discover our blind spots so that Gremlin is a company all of us feel proud to be a part of.
- FRUGALITY - We are working to build a profitable company and create a new practice in the industry. We spend money on the right things, like making sure employees have the tools they need to be successful and the company has what it needs; we simply choose not to waste what we have and not to buy what we don’t actually need.
- You are welcome at Gremlin for who you are. The more voices and ideas we have represented in our business, the more we will all flourish, contribute, and build a more reliable internet. Gremlin is a place where everyone can grow and is encouraged. However you identify and whatever background you bring with you, please apply if this sounds like a role that would make you excited to come into work everyday. It’s in our differences that we will find the power to keep building a more reliable internet by building and designing tools used by the best companies in the world.
- We can’t wait to meet you!
Learn more about how Gremlin is defining the practice of Chaos Engineering:
- Engineering for Chaos: Preparing for Disaster | Kolton Andrus | TEDxAsburyPark https://www.youtube.com/watch?v=BasOy54QGKo