Hi This is Hongyu's tutorial about Python in earch science!
I know there are numerous tutorials out there, but I would like to make this more project based thus help us copy and paste a little bit easier.
CSV is a very frequent used data type in many geoscience reserach and it could be easily converted from Microsoft excel or Kingsoft WPS.
This tutorial is going to teach you how to read a CSV file and make a scatter plot out of it.
First thing first, we would like to open Jupyter Notebook
Open a anaconda prompt, and then simly type (or just copy and paste then hit enter)
conda activate Data_Analysis
If things are working properly, you would see something like
(Data_Analysis) PS C:\Users\Jeff>
The most important sign is the (Data_Analysis) , this means your environment is running now!
Next, we start a new Jupyter Notebook!
type
jupyter notebook
You could see a browser poped up and have a jupyter notebook running now!
If you do not know how to setup your environment ?
Please refer to this tutorial to setup the anaconda environment!
After opened a new jupyter notebook, we want to load the CSV file into pandas.
Change the directory in jupyter notebook to the location you stored the CSV file
Start a new jupyter notebook (click the upper right corner witha button says new, then pick jupyter notebook)
type
import pandas as pd
This simply says you want to use pandas
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
This says you want to read a data file called
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
I know it could be confusing when you did not see any thing on your screen and you do not know whether the data is read or not
The easiest way to check is type Sample_Data
(whatever the name of you used previously)
Sample_Data # just like this
As you could see, we have 30 data points and each of them have their error bars and they are all stored in Sample_Data !
Alternative CSV reading
if you do not want to change directories to in jupyter notebook everything you used it or you have multiple data at different locations on your computer or server, you could read the data directly from their actual path
For example, if the file Scatter_Plot_Example_Data.csv is at location "C:\Users\Jeff\Data\Scatter_Plot_Example_Data.csv"
You could type
import pandas as pd
Sample_Data = pd.read_csv('C:\Users\Jeff\Data\Scatter_Plot_Example_Data.csv')
This will load the Scatter_Plot_Example_Data.csv file as well
If you want to make a scatter plot without error bars, pandas actually has a pretty neat function called plot.scatter
To use it and make the scatter plot, simply type
Sample_Data.plot.scatter(x="X",y="Y")
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
import matplotlib.pyplot as plot
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y')
# Formatting the labels on the axis
Figure_Scatter_Plot.set_xlabel("Time") # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
If you want to make a scatter plot with symmetric error bars, it could also be achieved by plot.scatter
To use it and make the scatter plot, use the yerr option. If you want to put x error bars, simply change it to xerr
Sample_Data.plot.scatter(x="X",y="Y",yerr='Y_error')
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
import matplotlib.pyplot as plot
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error')
# Formatting the labels on the axis
Figure_Scatter_Plot.set_xlabel("Time") # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
A simple scatter plot could be good for illustration, however making it into a more organized form ofen takes more time than actually making the plot! Here are some examples.
We need to turn on grid option
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error',grid='on')
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
import matplotlib.pyplot as plot
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr='Y_error',grid ='on')
# Formatting the labels on the axis
Figure_Scatter_Plot.set_xlabel("Time") # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
If you want to make a scatter plot with asymmetric error bars, it could also be achieved by plot.scatter
To use it and make the scatter plot, use the yerr option. If you want to put x error bars, simply change it to xerr
We need to contruct a strucure for error bars when it is asymmetric like
[ [ Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error'] ] ]
Therefore, to make the scatter plot. Type the following:
Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]])')
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
import matplotlib.pyplot as plot
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]])
# Formatting the labels on the axis
Figure_Scatter_Plot.set_xlabel("Time") # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
A simple scatter plot could be good for illustration, however making it into a more organized form ofen takes more time than actually making the plot! Here are some examples.
We need to turn on grid option
Figure_Scatter_Plot = ...
Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'],Sample_Data['Y_Upper_error']]],grid='on',title='Scatter Plot of Water Temperature vs Time')
#!/usr/bin/env python
import pandas as pd
Sample_Data = pd.read_csv('Scatter_Plot_Example_Data.csv')
import matplotlib.pyplot as plot
Figure_Scatter_Plot = Sample_Data.plot.scatter(x='X',y='Y',yerr=[[Sample_Data['Y_lower_error'], Sample_Data['Y_Upper_error']]],grid='on')
# Formatting the labels on the axis
Figure_Scatter_Plot.set_xlabel("Time") # set the x axis title
Figure_Scatter_Plot.set_ylabel("Water temperature") # set the y axis title
Figure_Scatter_Plot.set_title("Scatter Plot of Water Temperature vs Time") # set the title of the figure
When making a prettier scatter plot, there are actually a lot of choices out there in python.
For example, Here is an example using Plotly module to make the scatter plot
import plotly.express as plot_scatter_simple
Plotly_Figure = plot_scatter_simple.scatter(Sample_Data, x="X", y="Y",error_y="Y_error")
Plotly_Figure.update_traces(mode='markers', marker_line_width=2, marker_size=6,marker_color='blue')
Plotly_Figure.update_layout(title='Plotly Scatter Plot!',title_font_size=20)
Plotly_Figure.show()