• Manager Cloud Operations

    Job Locations US-MA-North Reading
    Posted Date 4 weeks ago(9/23/2018 9:32 AM)
    Job ID
    2018-1349
    Organization
    Engineering - Cloud Ops
  • ­

    Cloud Operations Manager

    Tracelink is seeking an experienced Cloud Operations Manager who has come up through the ranks of Linux Systems Administrator to join the Cloud Operations team supporting Tracelink’s Life Sciences Cloud (LSC) product.

    Reporting to the Director, Cloud Operations, the Cloud Operations Manager is a hands-on manager, responsible for the day-to-day activities of Systems Operations and Data Operations teams in addition to technical lead responsibilities.  The Manager leads the Systems and Data Operations teams that focus on managing Tracelink’s EC2-based systems, networks, and AWS services. Additionally, the teams monitor data acquisition, reporting and visualization, which includes the health of all LSC environments from QA through Production, the AMI build and release process, OS package repositories, system data collection agents (fluentd, telegraf, nrpe, Cloudwatch), storage environments (elasticsearch, influxdb, s3), visualization (Grafana, kibana, Nagois) and alerting.  The Manager will supervise up to 8 Cloud System Administrators. 

     

    ­

    Key Responsibilities

    • Manage the day-to-day activities of the Systems Operations and Data Operations teams
    • Monitor and manage the health of all Tracelink processing environments
    • Design and implement logging and monitoring solutions across all environments to include fluentd, elasticsearch, NRPE, telegraf, influxdb, cloudwatch, cloudtrail, and S3; monitoring includes EC2 instances, services running on those instances, and AWS services such as RDS, S3, Redshift, DynamoDB, Elasticache Redis and Memcached, Cloudsearch, Lambdas, ECS (docker) and others
    • Leads operations data visualization and reporting tools such as Kibana, Grafana, Nagios, etc.
    • Enable the process to backup and periodically test all datastores
    • Implement the Disaster Recovery strategy and conduct the annual DR test
    • Manage the AMI build/release process and the repos supporting monthly operating system updates
    • Create and maintain data feeds to other parts of the company
    • Manage Level 2 incident / on-call escalation policy
    • Manage Linux System administration work and projects, as assigned

    Interact with all levels within the organization, including developers, support techs, managers, etc

    ­

    Required Skills

    • Leadership experience (3-5 years) managing/mentoring team(s) in an operational SaaS environment
    • Ability to design, organize, manage and delegate work in support of assigned objectives
    • Excellent written and verbal communications skills in English
    • Demonstrated success supporting production SaaS applications
    • 3 years of experience as a RedHat/CentOS Linux System Administrator
    • 3 years of experience with AWS services
    • Experience with SSH, apache, and any Java servlet container (dropwizard, resin, tomcat, jboss, etc.)
    • Familiarity with monitoring and operations data visualization tools (e.g., Nagios, Kibana, Grafana)

    Options

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed