Job Description : Summary of the project/initiatives which describes what's being done: o Build, modernize and maintain the U.S. Bank AI/ML Platform & related frameworks / solutions. o Participate and contribute in architecture & design reviews. o Build/Deploy AI/ML platform in Azure with open-source applications (Argo, Jupyter Hub/Kubeflow) and/or cloud/SaaS solutions (Azure ML, Databricks). o You will design, develop, test, deploy, and maintain distributed & GPU-enabled Machine Learning Pipelines using K8s/AKS based Argo Workflow Orchestration solutions, while collaborating with Data Scientists. o Enable/Support platform to do distributed data processing using Apache Spark and other distributed / scale technologies. o Build ETL pipelines, ingress / egress methodologies in context to AIML use-cases. o Build highly scalable backend REST APIs for metadata management and other misc. business needs. o Deploy Application in Azure Kubernetes Service using GitLab, Jenkins, Docker, Kubectl, Helm and Manifest o Experience in branching, tagging, and maintaining the versions across different environments in GitLab. o Review code developed by other developers and provide feedback to ensure best practices (e.g., design patterns, accuracy, testability, efficiency etc.) o Work with relevant engineering, operations, business lines, and infrastructure groups to ensure effective architectures and designs and communicate findings clearly to technical and non-technical partners. o Perform functional, benchmark & performance testing and tuning to achieve performant AIML workflow(s), interactive notebook user experiences, and pipelines. o Assess, design & optimize the resources capacities for ML based resource (GPU) intensive workloads. o Communicate processes and results of the application with all parties involved in the product team, like engineers, product owner, scrum master and third-party vendors. Top 5-10 responsibilities for this position : o Experience developing AIML platforms & frameworks (including core offerings such as model training, inferencing, distributed/parallel programming), preferably on Kubernetes and native cloud. o Highly skilled with Python or JAVA programming languages o Highly skilled with database languages like SQL & NoSQL o Experience designing, developing, and deploying highly maintainable, extensible, and testable distributed applications using Python and other languages. o Experience developing ETL pipelines and REST APIs in Python using Flask or Django o Experienced with technologies/frameworks including Kubernetes, Helm Charts, Notebooks, Workflow orchestration tools, and CI/CD & monitoring frameworks. Basic Qualifications : •Bachelor's/master's degree in computer science or data science •6 - 8 years of experience in software development and with data structures/algorithms Required Technical Qualifications / Skills : •Experience with AI/ML open-source projects in large datasets using Jupyter, Argo, Spark, Pytorch, TensorFlow •Experience creating Unit and Functional test cases using PyTest, UnitTest •Experience with training and tuning models in Machine Learning •Experience working with Jupyter Hub •Experience with DB management system like PostgreSQL •Experience in searching, monitoring, and analyzing logs using Splunk/Kibana •GraphQL/Swagger implementation knowledge •Strong understanding and experience with Kubernetes for availability and scalability of applications in Azure Kubernetes Service •Experience building CI/CD pipelines using Cloudbees Jenkins, Docker, Artifactory, Kubernetes, Helm Charts and Gitlab •Experience with tools like Jupyter Hub, Kubeflow, MLFlow, TensorFlow, Scikit, Apache Spark, Kafka •Experience with workflow orchestration tools such as Apache Airflow, Argo workflows •Familiarity with Conda, PyPi, and Node.js package builds Preferred Qualifications / Skills : •Experience with AI/ML open-source projects in large datasets using Jupyter, Argo, Spark, Pytorch, TensorFlow •Experience creating Unit and Functional test cases using PyTest, UnitTest •Experience with training and tuning models in Machine Learning •Experience working with Jupyter Hub •Experience with DB management system like PostgreSQL •Experience in searching, monitoring, and analyzing logs using Splunk/Kibana •GraphQL/Swagger implementation knowledge •Strong understanding and experience with Kubernetes for availability and scalability of applications in Azure Kubernetes Service •Experience building CI/CD pipelines using Cloudbees Jenkins, Docker, Artifactory, Kubernetes, Helm Charts and Gitlab •Experience with tools like Jupyter Hub, Kubeflow, MLFlow, TensorFlow, Scikit, Apache Spark, Kafka •Experience with workflow orchestration tools such as Apache Airflow, Argo workflows •Familiarity with Conda, PyPi, and Node.js package builds Is responsible for developing, implementing and maintaining knowledge-based or artificial intelligence application systems. The individual should ensure that information is converted into a format that is digestible and easy for end users to access the information and utilize it optimally. ESSENTIAL FUNCTIONS : ? Designs and writes complex code in several languages relevant to our existing product stack, with a focus on automation ? Configures, tunes, maintains and installs applications systems and validates system functionality ? Monitors and fine tunes applications system to achieve optimum performance levels and works with hardware teams to resolve issues with hardware and software ? Develops and maintains department's knowledge database containing enterprise issues and possible resolutions. ? Develops models of task problem domain for which a system will be designed or built. ? Uses models, hypotheses, and cognitive analysis techniques to elicit real problem-solving knowledge from the experts ? Mediates between the expert and knowledge base; encodes for the knowledge base ? Acts as subject matter expert for difficult or complex application problems requiring interpretation of AI tools and principles ? Researches and prepares reports and studies on various aspects of knowledge acquisition, modeling, management, and presentation ? Develops and maintains processes, procedures, models, and templates for collecting and organizing knowledge into specialized knowledge representation programs ? Acts as vendor liaison for products and services to support development tools ? Maintains the definition, documentation, training, testing, and activation of Disaster Recovery/Business Continuity Planning to meet compliance standards ? Maintains a comprehensive operating system hardware and software configuration database/library of all supporting documentation to ensure data integrity ? Acts to improve the overall reliability of systems and to increase efficiency ? Works collaboratively with cross functional teams, using Agile / DevOps principles to bring products to life, achieve business objectives and serve customer needs Required Skills : Kubernetes, Azure, Spark, Python, ML Platform, ideally Kubeflow and/or ML Flow Background Check :Yes Drug Screen :Yes Notes : Selling points for candidate : Project Verification Info :"The information provided below is for Apex Systems AV use only and is not to be distributed publicly, or to any third party. Any distribution of the below information will result in corrective action from Apex Systems Vendor Management. MSA: Restricted Client Letter: Will Provide" Candidate must be your W2 Employee :Yes Exclusive to Apex :No Face to face interview required :No Candidate must be local :No Candidate must be authorized to work without sponsorship ::No Interview times set : :No Type of project : Master Job Title : Branch Code : Saxon Global
Job Type
Fulltime role
Skills required
Azure, Kubernetes, Jenkins, Docker, Python, NoSQL, CI/CD, PostgreSQL, Node.js
Location
Location not specified
Salary
No salary information was found.
Date Posted
February 22, 2025
Saxon Global Inc is seeking a Machine Learning Engineer/AI Engineer to build and maintain AI/ML platforms on Azure. The role involves collaborating with data scientists and developing scalable machine learning pipelines and applications.