Cloud-Based Data Infrastructure Engineering Intern (UC Student)

Position summary

The FOCAL Lab at UC Davis is recruiting undergraduate Cloud-Based Data Infrastructure Engineering Interns (two positions) to employ principles of big data, cloud infrastructure, and open-source software development to support the lab’s work at the intersection of forest ecology, forest mapping, and forest management. The interns will learn about and contribute to the implementation of open source infrastructure tools and frameworks to automate large-scale processing and cataloging of drone imagery and other types of data used to answer ecological questions. Specifically, the interns will be responsible for (a) advancing the adoption of Argo Workflows for orchestrating highly parallel, multi-step drone imagery processing workflows on cloud-based Kubernetes clusters, (b) improving automated tracking and cataloging of metadata, including transitioning from files to a relational PostgreSQL database, (c) automated archiving of data files in the CyVerse Data Store, and (d) if time, interest, and skills allow, developing a public web portal for browsing and accessing the processed and cataloged data and, ultimately, uploading new data.

A strong background in software engineering is required. A specific focus on open source development and data infrastructure is desired. An understanding of the technical concepts underlying cloud-based workflow orchestration and databases may be helpful, but more important is an ability to learn and interpret the documentation for these tools. Members of communities traditionally underrepresented in software engineering or ecology are strongly encouraged to apply.

Hours, dates, and work location

The preferred start date is late October 2024. The position will extend through at least May 2025, with an option for extension into the summer and beyond based on funding and performance. There will be 5-15 hours of work per week depending on the intern’s preference. The position may be remote or in person (office on the UC Davis campus) depending on preference.

Compensation

Approximately $18.50/hour.

Primary duties

  • Develop drone imagery processing workflows using Argo Workflows on cloud-based Kubernetes clusters, accommodating the evolving needs of the lab’s drone imagery processing workflow.
  • Set up a PostgreSQL database in the project’s cloud computing environment and integrate it into data processing workflows.
  • Develop scripted workflows for transferring processed data to the CyVerse Data Store in a carefully organized and thoroughly documented manner.
  • Document problems through github issues and respond to issues created by the supervisor and the community.
  • Propose and discuss improvements to efficiency or user experience.
  • Improve project-level documentation and example code.
  • Improve the quality of existing codebases by reorganizing, modularizing, documenting, and streamlining.
  • The intern will receive mentoring on general software development principles as well as feedback on their code through code reviews and meetings.

Minimum qualifications

  • Undergraduate student in software engineering, computer science, data science, or related field OR another field with demonstrated understanding of software engineering fundamentals
  • Familiarity using Linux and/or working in remote/cloud-based environments
  • Experience with version control such as git
  • Desire to create easy-to-use, open-source tools for ecologists

Desired qualifications (these are a plus but not required)

  • Experience with open-source or multi-contributor git concepts such as branching, merging, pull requests, and code review
  • Experience with R
  • Experience with Python
  • Understanding of dependency management and software packaging
  • Conceptual understanding of or experience with software for any of the following:
    • Geospatial information processing
    • Optimizing workflows for large dataset processing
    • Parallel or GPU computing
    • Computer vision or image processing
  • Experience interfacing with non-computing domain specialists (e.g. ecologists)

To apply

  • Please submit a cover letter (including your interest in the position, relevant experience, and availability dates) and a CV/resume. Combine this information into a single PDF and email it to David Russell, djrussell@ucdavis.edu, with the the subject line “Cloud-Based Data Infrastructure Engineering Intern Application”.
  • The position will remain open until filled. For full consideration, apply by October 15, 2024.
  • Applicants who apply by October 15 will be notified whether they have been selected for an interview by October 22.
  • References are not required in the initial application but we may request them after the interviews.