Machine Learning for the Natural Sciences

Consent of Instructor is not required to register.

Machine Learning and AI have matured to incredibly useful levels in recent years, with quite remarkable results, observed regularly.

This class focuses on applications of ML to data from the Natural Sciences. This is the main distinction of this course.

All of the examples derive from the natural sciences, as opposed to many the ML tutorials that focus on business, Natural Language Processing (NLP), and other topics of less interest to those of us in the natural sciences.

Python is the programming language of choice for a large subset of modern scientific computing, and will be used exclusively in this course. Those who wish to pursue methodologies in R are welcome to pursue that line of inquiry, as well, and with the help of modern LLMs you could easily do all of the examples in R, if you prefer. But Python is useful for many other aspects of computing, so it is encouraged to acquire this skill. In addition, most work can be done using Orange on vlab.pdx.edu.

In this course we will go from beginning Machine Learning to fully developed models. Actual AI will be introduced later in the term, but due to the time constraints of the term it will be less emphasized, with only one exercise. Deep learning will be introduced, but no exercises will be done.

You have a choice of four different themes for this class: Rocks, Birds, Water, or Space. You will choose one of these paths, and then follow a series of exercises that manipulate, extract, visualize & model these data.

Weekly quizzes that you can take until achieving 100%, regular exercises using real natural sciences data, and taped screencasts showing how to do the actual work will be provided.

We will also use an open source GUI for data science called Orange, which also has Python versions of the code.

This course is also ideal for graduate students hoping to make progress with your own data. A separate plan can be created that will allow you to use your own data, instead of the course data sets.

This class will be taught fully online for the second time in Spring 2025. The inaugural edition was in 2024, and the class has improved based on feedback from that experience.

Syllabus: ML4NatSci
Prerequisite: You need to know some Python in order to manipulate the code. Knowledge of variable types, loops, and being able to understand configuration parameters is essential. If you have previous experience in another language, such as R or JavaScript, this should come easily.

If you need to get up to speed quickly on Python, there are thousands of resources available, here are some that I have compiled for you:

from NASA: ARSET Python Course
Geoscience focused: https://www.fatiando.org/learn/index.html#learn

We will also be using an open source point-and-click machine learning tool called Orange Data Mining, feel free to download and play with it and watch the videos... It's installed on the Vlab, as well.