(10 minutes)
http://swcarpentry.github.io/python-novice-gapminder/01-run-quit/index.html
Once you have your Notebook open let`s start with the session.
Heo
Variables are names for determined values.
Note:
To assing the value to a variable use = like this:
variable = value
The variable is created when a value is assingned.
For example let's type in our names and ages:
age = 21
name = 'Mariana'
ejemplo= 5+5
print(age, name, ejemplo)
#1variable = 1
variable1 = 1
Only letters, digits and underscore _ (for long names)
Do not start with a digit
__real_age
, but this has a special meaning. (We won't do that yet)
print()
This built-in function prints values as text
Examples:
print('My name is', name, ', I am ', age)
Otherwise Python reports an error
cat_name = "Salem"
print(cat_name)
Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the order in which they appear.
Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present.
age = age + 1
print('Next year I will be', age)
Also, you can index to get a single character from a string.
'AB' != 'BA' -> ordering matters, because we can treat the string as a list of characters.
Python uses zero-based indexing.
atom_name = 'helium'
print(atom_name[0])
atom_name = 'helium'
print(atom_name)
print(atom_name[0])
Quick concepts
How to slice strings:
[start:stop], where start is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.
atom_name = 'sodium'
print(atom_name[0:3])
# Use len to find the lenght of a string
print(len('helium'))
Python is case-sensitive upper- and lower-case letters are different. Name != name are different variables.
Use meaningful variable names
flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')
hola = 1
HOLA= 2
print(hola, HOLA)
initial = 'left'
position = initial
initial = 'right'
initial = 'left'
position = initial
initial = 'right'
print(initial, position)
a = 123
, what happens if you try to get the second digit of a via a[1]
?a = 123
a[1]
atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])
atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])
1.What does thing[low:high] do?
2.What does thing[low:] (without a value after the colon) do?
3.What does thing[:high] (without a value before the colon) do?
4.What does thing[:] (just a colon) do?
5.What does thing[number:some-negative-number] do?
6.What happens when you choose a high value which is out of range? (i.e., try atom_name[0:15])
Use variables to store values.
Use print to display values.
Variables persist between cells.
Variables must be created before they are used.
Variables can be used in calculations.
Use an index to get a single character from a string.
Use a slice to get a substring.
Use the built-in function len to find the length of a string.
Python is case-sensitive.
Use meaningful variable names.
Existing types are:
(int)
: positive or negative whole numbers(float)
: real numbers(str)
: textUse the built-in function type
to find out what type a value or variable has
print(type(variable))
print(type(52))
fitness = 'average'
print(type(fitness))
Types control what operations (or methodds) can be performed on a given value
#This works
print(5 - 3)
#This works?
print('hello' - 'h')
Operators that can be used on strings
"Adding" characters strings concatenates them.
full_name = 'Ahmed'+ 'Walsh'
print(full_name)
Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.
separator = '=' * 10
print(separator)
#Text
print(len(full_name))
#Number
print(len(256))
Number of days since the start of the year.
Time elapsed from the start of the year until now in days.
Serial number of a piece of lab equipment.
A lab specimen’s age Current population of a city.
Average population of a city over time.
2.0
? Note: there may be more than one right answer.first = 1.0
second = "1"
third = "1.1"
1.first + float(second)
2.float(second) + float(third)
3.first + int(third)
4.first + int(float(third))
5.int(first) + int(float(third))
Every value has a type.
Use the built-in function type to find the type of a value.
Types control what operations can be done on values.
Strings can be added and multiplied.
Strings have a length (but numbers don’t).
Must convert numbers to strings or vice versa when operating on them.
Can mix integers and floats freely in operations.
Variables only change value when something is assigned to them.
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
We can create random numers easily
import numpy as np
from numpy import random
r = random.randint(1, 35)
print(r)
A very useful data structure, very similar to lists with a few exceptions such as:
Advantages over lists:
mylist = [1,2,3,4 ]
mylist
np.array(mylist)
mymatrix = [[1,2,3], [4,5,6], [7,8,9]]
mymatrix
m = np.array(mymatrix)
m
type(m)
There are many ways to create an array, let's try some:
Let's asume a
is any numpy array.
Function | Description |
---|---|
a.shape |
Returns a tuple with the numer of elements per dimension |
a.ndim |
Number of dimension |
a.size |
Number of elements in an array |
a.dtype |
Data type of the elements in the array |
a.T |
Transposes the array |
a.flat |
Collapses the array in 1 dimension |
a.copy() |
Returns a copy of the array |
a.fill() |
Fills the array with a determined value |
a.reshape() |
Returns an array with the same data but in the shape we indicate |
a.resize() |
Changes the shape of the array, but this does not creates a copy of the original array |
a.sort() |
Reorders the array |
# Try some!
print(m.dtype)
print(m)
a = np.linspace(0, 10, 11)
#blank spaces mean "everthing"
# all
print(a[1:11])
# 3 to 8
print(a[3:8])
# 1 to 9 with steps of size 2 (odd)
print(a[1:11:2])
# conditionals
print(a[a > 4])
Try it yourself
Arguments in a function
A function may take zero or more arguments
An argument is a value passed into a function
For example some of the functions we used so far have arguments
len
takes exactly one argumentlen(_x_)
int, str and float create a new value from an existing one
print takes zero or more arguments
print('before')
print()
print('after')
max
, min
and round
¶max()
min()
round()
You can also combine some functions
print(max(1, 2, 3))
print(min('a', 'A', '0'))
But these functions may only work for certain (combination of) arguments.
For example:
max and min must be given at least one argument.
And they must be given things that can meaningfully be compared.
print(max(1,4,6,9,100000))
print(min('a', 'A', 0))
round
will round off a floating-point number.round(3.712)
# We can specify the number of decimal places we want. Let's try:
round(3.712, 2 )
help
to get help for a function.¶help(round)
help(max)
# Forgot to close the quote marks around the string.
name = 'Feng
Try it!
name = 'Feng'
print(name)
age = 53
remaining = 100 - age # mis-spelled 'age'
print(remaining)
There are 2 ways to get help in a Jupyter Notebook
Place the cursor anywhere in the function invocation (i.e., the function name or its parameters), hold down shift
, and press tab
.
Or type a function name with a question mark after it.
Every function returns something
None
.max?
result = print('example')
print('result of print is', result)
radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
# The value is ... 2.6
print(radiance)
easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
print(max(len(rich), len(poor)))
Why don’t max and min return None when they are given no arguments?
Use comments to add documentation to programs.
A function may take zero or more arguments.
Commonly-used built-in functions include max, min, and round.
Functions may only work for certain (combinations of) arguments.
Functions may have default values for some arguments.
Use the built-in function help to get help for a function.
The Jupyter Notebook has two ways to get help.
Every function returns something.
Python reports a syntax error when it can’t understand the source of a program.
Python reports a runtime error when something goes wrong while a program is executing.
Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.
10 min
Most of the power of a programming language is in its libraries.
Library = collection of files (modules) that functions for use by other programs.
The Python standard library is an extensive suite of modules that comes with Python itself.
Other libraries available in PyPI (the Python Package Index).
A program must import a library module before using it.
import math
print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))
We can also use help() to learn about the content of a library module, just like we do with functions!
help(math)
Import specific items from a library module to shorten programs.
from math import cos, pi
print('cos(pi) is', cos(pi))
Create an alias for a library module when importing it to shorten programs.
You want to select a random character from a string:
bases = 'ACTTGCTTGAC'
from random import randrange
bases = 'ACTTGCTTGAC'
print(bases[randrange(len(bases))])
Rearrange the following statements so that a random DNA base is printed and its index in the string. Not all statements may be needed. Feel free to use/add intermediate variables.
bases="ACTTGCTTGAC"
import math
import random
___ = random.randrange(n_bases)
___ = len(bases)
print("random base ", bases[___], "base index", ___)
import math as m
angle = m.degrees(m.pi / 2)
print(angle)
____ math import ____, ____
angle = degrees(pi / 2)
print(angle)
from math import log
log(0)
A widely know library for statistics, particularly on tabular data.
Dataframe: A 2-dimensional table whose columns have names and potentially have different data types.
#Let's import pandas
import pandas as pd
data = pd.read_csv('/home/mcubero/dataSanJose19/data/gapminder_gdp_oceania.csv')
print(data)
Our lessons store their data files in a data sub-directory, which is why the path to the file is data/gapminder_gdp_oceania.csv. If you forget to include data/, or if you include it but your copy of the file is somewhere else, you will get a runtime error that ends with a line like this:
ERROR
OSError: File b'gapminder_gdp_oceania.csv' does not exist
data = pd.read_csv('/home/mcubero/dataSanJose19/data/gapminder_gdp_oceania.csv', index_col='country')
data
Use DataFrame.info to explore a little the dataframe
data.info()
What we know?
data is the dataframe, not a method, don't use () to try to call it.
print(data.columns)
Use DataFrame.T to transpose a dataframe
print(data.T)
data.T
DataFrame.describe() gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument include='all'
data.describe()
Read the data in gapminder_gdp_americas.csv
(should be in the same directory as gapminder_gdp_oceania.csv
)
check the parameters to define the index.
Inspect the data.
Use the function help(americas.head) and help(americas.tail) the answer:
The data for your current project is stored in a file called microbes.csv, which is located in a folder called field_data. You are doing analysis in a notebook called analysis.ipynb in a sibling folder called thesis:
your_home_directory
+-- field_data/
| +-- microbes.csv
+-- thesis/
+-- analysis.ipynb
What value(s) should you pass to read_csv to read microbes.csv in analysis.ipynb?
As well as the read_csv function for reading data from a file, Pandas provides a to_csv function to write dataframes to files. Applying what you’ve learned about reading from files, write one of your dataframes to a file called processed.csv. You can use help to get information on how to use to_csv.
data2 = data.copy()
#pd.to_csv
#data2.to_csv?
data2.to_csv('/home/mcubero/dataSanJose19/data/processed.csv')
Use the Pandas library to get basic statistics out of tabular data.
Use index_col to specify that a column’s values should be used as row headings.
Use DataFrame.info to find out more about a dataframe.
The DataFrame.columns variable stores information about the dataframe’s columns.
Use DataFrame.T to transpose a dataframe.
Use DataFrame.describe to get summary statistics about data.