Cloud Platform Lead Consultant or Senior Consultant

Allstate•16h ago

United StatesRemoteFull-timeMid Level2+ yrs exp

Top focus

Cloud EngineerPlatform EngineerCloud Architect

At Allstate, great things happen when our people work together to protect families and their belongings from life’s uncertainties. And for more than 90 years, our innovative drive has kept us a step ahead of our customers’ evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection.

Job Description Arity is a part of the Allstate Corporation, which means we have the same innovative drive that keeps us a step ahead of our customers' evolving needs. We collect and analyze enormous amounts of data in order to provide cutting-edge solutions to companies invested in transportation.

We are considering both Lead Consultant or Senior Consultant levels. The Team Our engineers are fueled by a passion to impact the future of mobility. They push the boundaries of telematics and transportation tech by creating and supporting cutting-edge products.

As part of an Agile team, they are armed with the freedom to innovate and the opportunity to see projects through from start to finish. Using a variety of languages and a top-notch technology stack, our engineers make critical advances in areas like sensor technology, enterprise engineering and platform development.

Our team understands what it means to collaborate and communicate in an interconnected global team, all while having trust, transparency and empathy for the end user. The Operational Data Management team is a specialized group within the Engineering department responsible for the reliability, performance, and scalability of Arity's data infrastructure.

We own and operate mission-critical database and streaming platforms—including Apache Cassandra, PostgreSQL, Redis, Valkey, Amazon Redshift, Google BigQuery, Amazon MSK (Kafka), and Google Pub/Sub—as well as analytics and query layers such as Starburst Galaxy and AWS Athena.

We partner closely with application development teams to tune, troubleshoot, and optimize applications that depend on these technologies, ensuring that the data platforms powering Arity's mobility insights remain highly available and performant at scale.

The Role Arity is seeking a Cloud Platform Lead Consultant or Senior Consultant to join our Operational Data Management team within Engineering. This is a fully remote position. In this role, you will design, build, deploy, and operate cloud-native data infrastructure across AWS and Google Cloud Platform while bringing deep hands-on expertise in databases, data streaming, and distributed systems.

You will ensure the platforms that ingest, store, and serve billions of miles of driving data remain resilient, observable, and cost-efficient—directly enabling Arity's products and the customers who rely on them to make smarter transportation decisions.

The ideal candidate combines cloud engineering mastery with strong database and streaming fundamentals, advanced production-grade coding skills in Python, and demonstrated hands-on experience building AI agents and Model Context Protocol (MCP) servers to streamline DevOps workflows.

A successful candidate rapidly adopts new technologies and delivers production-ready solutions with them, guides application developers on performance improvements and code-level fixes, and independently leads root cause analysis for complex production incidents.

Key Responsibilities Design, deploy, and manage highly available database platforms including Apache Cassandra, PostgreSQL, Redis, Valkey, Amazon Redshift, and Google BigQuery across multi-cloud environments. Build, operate, and optimize data streaming infrastructure using Amazon MSK (Kafka), Google Pub/Sub, and Apache Flink to support real-time and batch data pipelines.

Develop and maintain infrastructure-as-code, CI/CD pipelines, and cloud automation using Python and industry-standard tooling to enable repeatable, secure deployments. Implement comprehensive monitoring, alerting, and observability for data platform services to proactively detect and resolve issues before they impact customers.

Partner with application development teams to troubleshoot, tune, and optimize application performance, query patterns, and data access layers backed by team-managed platforms. Administer and optimize analytics and query engines including Starburst Galaxy and AWS Athena to deliver performant, cost-effective access to large-scale datasets.

Lead incident response, root cause analysis, and post-incident reviews for production database and streaming systems; drive remediation and preventive improvements. Participate in an on-call rotation to provide 24x7 support for mission-critical data infrastructure.

Evaluate and adopt emerging technologies—including AI agents and MCP servers—to automate operational tasks, improve developer experience, and accelerate DevOps workflows. Contribute to capacity planning, disaster recovery, security hardening, and cost optimization initiatives across the data platform estate.

Ability to review application source code , identify root causes of performance or reliability issues, and contribute targeted fixes or optimization guidance in collaboration with development teams. Demonstrated ability to rapidly adopt unfamiliar technologies and deliver production-ready solutions within days to a week; strong self-directed learning with a track record of picking up new platforms, frameworks, and tools independently.

Proven ability to guide and advise software development teams on application-level performance tuning, query optimization, code-level improvements, and production troubleshooting—functioning as a technical authority on data platform usage patterns.

Strong understanding of distributed systems principles including high availability, fault tolerance, consistency models, and disaster recovery. Excellent problem-solving, communication, and documentation skills with a track record of ownership in on-call and incident management environments.

Ability to read, debug, and analyze Java application code including Spring Boot and microservice frameworks; proficiency in JVM diagnostics including heap dump analysis, GC tuning, thread dump interpretation, and connection pool (e.g., HikariCP) troubleshooting.

Required Qualifications 3-5 or more years of overall software engineering or infrastructure experience, with at least 2-4 years in site reliability engineering, DevOps, or platform engineering operating production systems at scale. Demonstrated expertise designing, deploying, and managing cloud infrastructure on AWS and/or Google Cloud Platform , including networking, identity, and security fundamentals.

Strong hands-on experience with relational and NoSQL database s; production experience with PostgreSQ L and at least one distributed database such as Apache Cassandra. Production experience operating data streaming platforms; hands-on experience with Apache Kafka (including Amazon MSK) and a solid understanding of streaming fundamentals (partitions, consumer groups, delivery semantics, backpressure).

Advanced, production-grade Python and Shell scripting development skills, including writing, reviewing, and debugging application code, building custom automation tooling, and developing operational solutions that go well beyond basic scripting.

Strong experience with infrastructure-as-code (e.g., Terraform, Terraform Enterprise (TFE), Env0), Jenkins, Ansible, Git CI/CD pipelines and container orchestration (e.g., Kubernetes ) in production environments. Experience implementing and automating monitoring , logging, and alerting solutions for distributed systems (e.g., Prometheus, Grafana, CloudWatch, Datadog, or equivalent), including building automated runbooks and self-healing remediation workflows.

Proven track record independently leading root cause analysis for complex production incidents that span infrastructure, databases, streaming pipelines, and application code layers. Desired Skills Production experience operating and troubleshooting Apache NiFi, including flow design, processor-level debugging, back-pressure configuration, cluster management, and contributing flow-level fixes and optimizations.

Hands-on operational experience with self-managed Apache Flink, including checkpoint management, state backend configuration, TaskManager memory tuning, job graph analysis, and application-level debugging of streaming jobs under backpressure.

Deepened experience with Apache Cassandra , PostgreSQL, DynamoDB, Amazon Redshift, ElastiCache and/or Google BigQuery in production environments. Advanced experience with Apache Kafka , Apache Flink, Google Pub/Sub and operating streaming workloads across both AWS and GCP.

Experience administering or optimizing Starburst Galaxy, Trino, or AWS Athena for large-scale analytics workloads. Experience building AI agents or Model Context Protocol (MCP) servers to automate DevOps, observability, or operational workflows.

Hands-on experience with large language models (LLMs), including fine-tuning, prompt engineering, RAG pipeline development, or training custom models for operational and DevOps use cases. Familiarity with data pipeline orchestration tools (e.g., Apache Airflow, dbt) and event-driven architectures.

Experience troubleshooting and supporting applications deployed on enterprise PaaS platforms (e.g., Cloud Foundry, or equivalent) including understanding platform-level resource constraints, routing, and application lifecycle management. Working proficiency in Golang sufficient to read production application code, interpret runtime behavior (goroutines, memory, pprof profiling), and contribute targeted performance fixes in collaboration with development teams.

AWS or Google Cloud professional-level certifications. Experience with performance benchmarking, query plan analysis, and database capacity planning for high-throughput workloads. Familiarity with application profiling, distributed tracing, and performance diagnostic tooling (e.g., APM, query analyzers, flame graphs) to isolate and resolve end-to-end latency issues.

Contributions to open-source database, streaming, or infrastructure projects. Supervisory Responsibilities This job does not have supervisory duties #LI_NJ1 Skills Amazon CloudWatch, Amazon CloudWatch, Amazon ElastiCache, Amazon MQ, Amazon Web Services (AWS), Ansible (Software), Apache Airflow, Apache Cassandra, Apache Flink, Apache Kafka, Apache NiFi, Application Performance, AWS DynamoDB, Cloud Engineering, Cloud Foundry, Cloud Infrastructure, Cloud Monitoring, Cloud Native, Cloud Platform, Datadog, Data Pipelines, Data Query, Distributed Databases, Distributed Systems, Git {+ 20 more} Compensation Compensation offered for this role is 100,000.00 - 170,500.00 annually and is based on experience and qualifications.

The candidate(s) offered this position will be required to submit to a background investigation. Joining our team isn’t just a job — it’s an opportunity. One that takes your skills and pushes them to the next level. One that encourages you to challenge the status quo.

One where you can shape the future of protection while supporting causes that mean the most to you. Joining our team means being part of something bigger – a winning team making a meaningful impact. Allstate generally does not sponsor individuals for employment-based visas for this position.

Effective July 1, 2014, under Indiana House Enrolled Act (HEA) 1242, it is against public policy of the State of Indiana and a discriminatory practice for an employer to discriminate against a prospective employee on the basis of status as a veteran by refusing to employ an applicant on the basis that they are a veteran of the armed forces of the United States, a member of the Indiana National Guard or a member of a reserve component.

For jobs in San Francisco, please click “ here ” for information regarding the San Francisco Fair Chance Ordinance. For jobs in Los Angeles, please click “ here ” for information regarding the Los Angeles Fair Chance Initiative for Hiring Ordinance.

To view the “EEO Know Your Rights” poster click “ here ”. This poster provides information concerning the laws and procedures for filing complaints of violations of the laws with the Office of Federal Contract Compliance Programs. To view the FMLA poster, click “ here ”.

This poster summarizing the major provisions of the Family and Medical Leave Act (FMLA) and telling employees how to file a complaint. It is the Company’s policy to employ the best qualified individuals available for all jobs. Therefore, any discriminatory action taken on account of an employee’s ancestry, age, color, disability, genetic information, gender, gender identity, gender expression, sexual and reproductive health decision, marital status, medical condition, military or veteran status, national origin, race (include traits historically associated with race, including, but not limited to, hair texture and protective hairstyles), religion (including religious dress), sex, or sexual orientation that adversely affects an employee's terms or conditions of employment is prohibited.

This policy applies to all aspects of the employment relationship, including, but not limited to, hiring, training, salary administration, promotion, job assignment, benefits, discipline, and separation of employment. Allstate provides a comprehensive technology setup, including a laptop, monitors, headset, keyboard, and mouse.

Employees eligible to work from home also receive a monthly connectivity reimbursement to help offset internet costs. When working from home, you must have a dedicated, private workspace free from distractions, along with appropriate desk and seating.

Reliable internet is required, with minimum speeds of 50 MB download and 5 MB upload.

Required skills

AWSGoogle Cloud PlatformPythonApache CassandraPostgreSQLRedisAmazon RedshiftGoogle BigQueryAmazon MSKKafkaGoogle Pub/SubApache FlinkStarburst GalaxyAWS Athena