Careers - Job Details

Site Reliability Engineer

Summary:

BGI has the following Contract opportunity with our direct client in Union NJ

Job ID/Number:

201904-2093

Posted Date:

4/5/2019

Job Location:

Union, NJ

Position Type:

Contractor

Division:

Information Technology

Description:

The Site Reliability Team at Bed Bath and Beyond is looking for a Site Reliability Engineer (SRE) who can build, instrument, troubleshoot, automate and triage highly scalable legacy and modern systems.

The candidate will be part of a team with a mission to blend a variety of skill sets and work collaboratively to ensure not only that we deliver quality, but also take an active role in determining what architectures and technologies perform, scale and deliver services reliably.

Responsibilities

Troubleshoot issues across the entire stack - hardware, software, applications and network.

Design, build, test, and automate discovery, instrumentation, alerting, and escalation of monitoring.

Document and articulate clearly all efforts and communicate and demonstrate to the team with ease.

On-call responsibilities.

Qualifications

Capable of responding to major\critical events and be an active participant in determining solutions and instrumentation Hands on experience building fault tolerant infrastructure and monitoring instrumentation with such technologies as Kubernetes, Kafka, Cassandra, AWS, GCP, etc.

Experience instrumenting and researching issues with CA Monitoring Suite, Nagios, InfluxDB, Grafana, Prometheus, Stack Driver, Sumo Logic, New Relic, Quantum Metric, Tealeaf etc...

Familiarity with tools such as Puppet, Ansible, Salt, Chef, or CFEngine would be a plus.

Additional familiarity with log analysis tools such as Sumo Logic, ELK, and Splunk would also be helpful.

Practical knowledge of shell scripting and at least one scripting language (Python, Ruby).

close (X)