Senior Software Engineer-Data Engineering

Caterpillar•3h ago

Chennai, Tamil NaduOnsiteFull-timeSenior Level8+ yrs exp

Top focus

Software EngineerSenior Software EngineerSenior Data EngineerData EngineerSoftware Engineer Ii

Career Area: Technology, Digital and Data Job Description: Your Work Shapes the World at Caterpillar Inc. When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities.

We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it. Job Summary We are looking for a highly motivated and experienced Data Engineer to join our data engineering team.

The ideal candidate will have a strong background in building scalable data pipelines using the AWS cloud stack and extensive hands-on experience with Snowflake. Proficiency in Python and SQL, along with graph and vector database technologies, is essential.

This role requires strong problem-solving abilities and a proactive mindset to deliver efficient, scalable, and reliable data solutions. Key Responsibilities Design, develop, and maintain scalable data pipelines on AWS using services such as S3, Glue, Lambda, Redshift, and EMR.

Build and optimize data warehousing solutions using Snowflake, including performance tuning and data modeling. Write efficient and reusable code in Python and SQL for data transformation and processing. Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements.

Develop and optimize solutions using graph databases (e.g., Neo4j, Amazon Neptune), including query design and performance tuning. Design, build, and operate vector database solutions (e.g., Milvus, Amazon OpenSearch) to support semantic search, recommendations, RAG, and AI-driven use cases.

Integrate vector databases with LLM-based applications and AI workflows. Monitor, troubleshoot, and improve pipeline performance and reliability. Ensure data quality, integrity, and security across all stages of the pipeline. Participate in code reviews, architecture discussions, and continuous improvement initiatives.

Required Qualifications 8+ years of experience in data engineering or related roles. Strong hands-on experience with AWS cloud services, including data and AI workloads. Deep understanding of Snowflake architecture, performance tuning, and best practices.

Advanced proficiency in Python and SQL for data pipelines, transformations, and services. Strong understanding of graph and vector data modelling concepts and their practical applications. Hands-on experience with graph databases (e.g., Neo4j, Neptune) and vector databases (e.g., Milvus, Amazon OpenSearch).

Experience with version control systems (e.g., Git) and Git workflows. Experience working with Azure DevOps (AzDO) boards for backlog management in Agile environments. Excellent analytical and problem-solving skills. Strong communication and collaboration abilities.

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. Nice to Have skills Knowledge of the NVIDIA ecosystem and its applications in data and AI. Preferred Qualifications Experience with orchestration tools such as AWS Step Functions.

Familiarity with data governance and compliance practices. Exposure to real-time data processing frameworks (e.g., Kafka, Spark Streaming). Mode detail on Knowledge Base Experience designing and deploying data ingestion pipelines for unstructured sources such as PDFs, Word documents, and HTML files, including text extraction, chunking strategies, and embedding generation at scale.

Hands-on expertise with vector databases, specifically Milvus, covering schema design, indexing, and optimizing write performance for large-scale embedding ingestion pipelines. Proficiency in building Knowledge Graph ingestion pipelines using Graph Databases — including entity extraction, relationship modelling, and populating nodes and attributes.

Strong pipeline engineering skills in Python and frameworks for orchestrating multi-stage document processing workflows, with experience deploying and monitoring these pipelines in production environments. Bonus: Exposure to RAPIDS libraries (cuDF, cuML, cuGraph) or CUDA-based tooling for GPU-accelerated data processing, enabling faster transformation and optimization during large-scale ingestion workflows.

Posting Dates: June 19, 2026 - June 25, 2026 Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to apply Not ready to apply? Join our Talent Community .

Required skills

PythonSQLAWSSnowflakegraph databasevector databaseNeo4jAmazon NeptuneMilvusAmazon OpenSearchGitAzure DevOps