Skip to Main Content

MPC's STEAM Art Exhibit 2023

Project Title: Estimating Mean Fish Length in a Kelp Forest Tank
Contributors: Andrew Malloy & Jackson Hsu

For best results, view the charts after submitting the Google Form.

Click to reveal dot-plot self-selection results.

Click to reveal random sample dot plot results.

Sample Mean Definition

A mean is a measure of center and is calculated by adding up individual observations and dividing them by the number of observations in your population. We often can not observe every individual in a population, so we use a sample (or subset) from the population to estimate the population mean. This is why it is important to select a sample that will represent your population well.

In our activity, we are taking fish lengths from five fish, adding them, then dividing this sum by five (the number of fish in our sample). For the purpose of this activity, we use the terms mean and average interchangeably because they are calculated with the same equation. Mathematically, the sample mean is represented by the symbol X-bar (an x with a bar on top) and our equation used to calculate the mean is: 

equation for calculating a sample mean: X-bar equals the sum of sampled values (x-n) divided by the number of samples (n)

where Xi is fish length and n is the number of fish in our sample. The symbol  indicates that we are calculating a sum of our selected observations.

We can use our calculated sample mean as a reference point to compare individual data points to.

 

The Takeaway

In our activity, we want to show the importance of taking a sample from the population. We also want to show that different samples can produce different results. The self sample dot plot could reveal a human bias in our choices by showing where mean lengths fall in comparison to randomly selected samples.

Human bias usually causes us to select samples based on size, proximity, or visual characteristics. For example, choosing bigger or more brightly colored fish over those that are smaller or harder to see.

In statistics, we try our best to avoid bias by using random sampling. 

 

A Real World Application

Fish populations are often estimated based on a random sample. This helps fisheries managers determine the health of fish stocks and make regulatory decisions. If we only sampled the largest fish, we would risk overestimating fish size, and miss accurately portraying the population in terms of fecundity and generational recruitment.

The ocean sport fishery is randomly sampled throughout California. Fish are measured to attain lengths and weights which are then compared to set harvest limits, often set in pounds for the state. When fish can not be weighed, weights can be attained through length-weight regressions. In this way, gathering lengths is one way to estimate the total pounds of fish harvested for a given population of sport-caught fish and determine the status of the fishery (i.e. Is a species being overfished?).
 
Just like in the kelp forest tank example, it would be impractical and arguably impossible to measure every fish caught and landed in the state of California, let alone the entire length of the Pacific coast line. A random sample allows us to gather data efficiently and in a timely manner to keep fisheries sustainable along our coast.