Responsibilities:
â¢Setting up infrastructure and running large-scale data processing with Spark, Hive, Hadooop, etc. on AWS
⢠Assisting in sourcing, designing, integrating, and managing large complex geospatial datasets using APIs and AWS cloud tools (EC2, S3, Sagemaker)
⢠Building and supporting production-quality Airflow data pipelines
⢠Processing datasets on Google Earth Engine
⢠Continuous integration and deployment (CI/CD, Github)
⢠Preparing and maintaining technical documentation for datasets and deployed models (metadata, data dictionaries, code annotation, process diagrams)
⢠Performing exploratory data analysis, visualizing information using existing and new tools (QGIS, Python, R, etc.)
⢠Developing testing frameworks and tests of database and model quality and performance
Required Qualifications:
⢠5+ years experience with Dask, Ray, GDAL, or Kubernetes
⢠5+ years experience with at least one large-scale distributed data processing framework (Spark, Hadoop, Hive, etc.)
⢠5+ years experience with creating and deploying data pipelines on AWS
⢠5+ years experience with AWS (Lambda, EC2, S3, managed Airflow, Sagemaker, VPCs)
⢠Degree in Computer Science, Data Science, Engineering, Geography, Remote Sensing, or other highly quantitative disciplines.
⢠High level of proficiency in Python or R
⢠Ability to collaborate and communicate with both technical and non-technical stakeholders
⢠Ability to work independently and make key infrastructure decisions in a startup environment
⢠Experience in geospatial data, remote sensing techniques, satellite imagery, or computer vision
⢠Experience working with large raster or image data sets.
This job is already closed and no longer accepting applicants, sorry.