Session 1¶

Plotting and programming with Python 3¶

Open a Jupyter Notebook¶

(10 minutes)

http://swcarpentry.github.io/python-novice-gapminder/01-run-quit/index.html

Once you have your Notebook open let`s start with the session.

Heo

Variables and Assignment¶

(15 min)

Exercises (10min)

Variables are names for determined values.

Note:

To assing the value to a variable use = like this:

variable = value
The variable is created when a value is assingned.

For example let's type in our names and ages:

age = 21
name = 'Mariana'
ejemplo= 5+5
print(age, name, ejemplo)

21 Mariana 10

#1variable = 1
variable1 = 1

Variable names¶

Only letters, digits and underscore _ (for long names)
Do not start with a digit
It is possible to start with an underscore like __real_age, but this has a special meaning.

(We won't do that yet)

Display values with:¶

print()

This built-in function prints values as text

Examples:

print('My name is', name, ', I am ', age)

My name is Mariana , I am  21

Please remember that variables must be created before they are used!¶

Otherwise Python reports an error

cat_name = "Salem"
print(cat_name)

Salem

Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the order in which they appear.

Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present.

Variables can be reused¶

age = age + 1 
print('Next year I will be', age)

Next year I will be 22

Also, you can index to get a single character from a string.

'AB' != 'BA' -> ordering matters, because we can treat the string as a list of characters.
Python uses zero-based indexing.
Use square brackets to get a character in a specific position.

atom_name = 'helium'
print(atom_name[0])

atom_name = 'helium'
print(atom_name)
print(atom_name[0])

helium
h

Slicing to get a substring!¶

Quick concepts
- Substring:A part of a string. (1 to n number of characters)
- Element: An item on a list. In a string, the elements are the characters.
- Slice: part of a string (or any list-like thing)

How to slice strings:

[start:stop], where start is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

atom_name = 'sodium'
print(atom_name[0:3])

# Use len to find the lenght of a string
print(len('helium'))

sod
6

Final considerations:¶

Python is case-sensitive upper- and lower-case letters are different. Name != name are different variables.
Use meaningful variable names

flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')

hola = 1

HOLA= 2
print(hola, HOLA)

1 2

Excercises¶

What is the final value of position in the program below? (Try to predict the value without running the program, then check your prediction.)

initial = 'left'
position = initial
initial = 'right'

initial = 'left'
position = initial
initial = 'right'
print(initial, position)

right left

If you assign a = 123, what happens if you try to get the second digit of a via a[1]?

a = 123
a[1]

----------------------------------------------------------------------
TypeError                            Traceback (most recent call last)
<ipython-input-19-d77b277d0f9d> in <module>()
      1 a = 123
----> 2 a[1]

TypeError: 'int' object is not subscriptable

What does the following program print?

atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])

atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])

atom_name[1:3] is: ar

1.What does thing[low:high] do?

2.What does thing[low:] (without a value after the colon) do?

3.What does thing[:high] (without a value before the colon) do?

4.What does thing[:] (just a colon) do?

5.What does thing[number:some-negative-number] do?

6.What happens when you choose a high value which is out of range? (i.e., try atom_name[0:15])

KEY POINTS - Variable and Assignment¶

Use variables to store values.

Use print to display values.

Variables persist between cells.

Variables must be created before they are used.

Variables can be used in calculations.

Use an index to get a single character from a string.

Use a slice to get a substring.

Use the built-in function len to find the length of a string.

Python is case-sensitive.

Use meaningful variable names.

Data Types and Type Conversion¶

(10 min)

Exercises (10min)

Every value has a type.¶

Existing types are:

Integer (int): positive or negative whole numbers
Floating point numbers (float): real numbers
Character string (str): text
- Written in either single or double quotes (but they have to match)

Use the built-in function type to find out what type a value or variable has

print(type(variable))

print(type(52))

fitness = 'average'
print(type(fitness))

<class 'float'>
<class 'str'>

Operations or methods for a given type¶

Types control what operations (or methodds) can be performed on a given value

A value’s type determines what the program can do to it.

#This works
print(5 - 3)

2

#This works?
print('hello' - 'h')

----------------------------------------------------------------------
TypeError                            Traceback (most recent call last)
<ipython-input-24-c60e4b1b64d0> in <module>()
      1 #This works?
----> 2 print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Operators that can be used on strings
- +
- *
"Adding" characters strings concatenates them.

 full_name = 'Ahmed'+ 'Walsh'
print(full_name)

AhmedWalsh

Multiplying a character string by an integer N creates a new string that consists of that character string repeated N times.

separator = '=' * 10
print(separator)

==========

Try function len on strings and numbers

#Text
print(len(full_name))

10

#Number
print(len(256))

----------------------------------------------------------------------
TypeError                            Traceback (most recent call last)
<ipython-input-29-00ccdb9afd37> in <module>()
      1 #Number
----> 2 print(len(256))

TypeError: object of type 'int' has no len()

Excercises¶

What type of value is 3.4? How can you find out?

What type of value is 3.25 + 4?

What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?

Number of days since the start of the year.

Time elapsed from the start of the year until now in days.

Serial number of a piece of lab equipment.

A lab specimen’s age Current population of a city.

Average population of a city over time.

Which of the following will return the floating point number 2.0? Note: there may be more than one right answer.

first = 1.0
second = "1"
third = "1.1"

1.first + float(second)

2.float(second) + float(third)

3.first + int(third)

4.first + int(float(third))

5.int(first) + int(float(third))

2.0 * second

Key Points - Data Types and Type Conversion¶

Every value has a type.
Use the built-in function type to find the type of a value.
Types control what operations can be done on values.
Strings can be added and multiplied.
Strings have a length (but numbers don’t).
Must convert numbers to strings or vice versa when operating on them.
Can mix integers and floats freely in operations.
Variables only change value when something is assigned to them.

STRETCHING TIME!¶

Numpy¶

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

We can create random numers easily

import numpy as np

from numpy import random

r = random.randint(1, 35)
print(r)

26

A very useful data structure, very similar to lists with a few exceptions such as:

The number of elements in the array cannot be changed. (This means we cant .append() or remove)
All element in the arrays must be the type of variables
Once the array is created we can't change the data type.

Advantages over lists:

Arrays can be n-dimensional (vector, matrix, tensor)
Supports arithmetic algebraic
Arrays work faster than lists

mylist = [1,2,3,4 ]
mylist

[1, 2, 3, 4]

 np.array(mylist)

array([1, 2, 3, 4])

mymatrix = [[1,2,3], [4,5,6], [7,8,9]]
mymatrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

m = np.array(mymatrix)
m

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

 type(m)

numpy.ndarray

There are many ways to create an array, let's try some:

Let's asume a is any numpy array.

Function	Description
`a.shape`	Returns a tuple with the numer of elements per dimension
`a.ndim`	Number of dimension
`a.size`	Number of elements in an array
`a.dtype`	Data type of the elements in the array
`a.T`	Transposes the array
`a.flat`	Collapses the array in 1 dimension
`a.copy()`	Returns a copy of the array
`a.fill()`	Fills the array with a determined value
`a.reshape()`	Returns an array with the same data but in the shape we indicate
`a.resize()`	Changes the shape of the array, but this does not creates a copy of the original array
`a.sort()`	Reorders the array

# Try some!
print(m.dtype)
print(m)

int64
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Slicing, but in arrays¶

It's more flexible. Slicings follows this structure:

beginning:end:step

a = np.linspace(0, 10, 11)
#blank spaces mean "everthing" 

# all 
print(a[1:11])

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

 #  3 to 8
print(a[3:8])

#  1 to 9 with steps of size 2 (odd)
print(a[1:11:2])

[3. 4. 5. 6. 7.]
[1. 3. 5. 7. 9.]

# conditionals 
print(a[a > 4])

[ 5.  6.  7.  8.  9. 10.]

Exercises¶

Try it yourself

Create a non squared matrix, and print the dimensions (.shape)
Use .reshape to change the shape of the array
Try . size is equal to np.prod(a.shape)

Built-in Functions and Help¶

(10 min)

Exercises (10 min)

Arguments in a function

A function may take zero or more arguments
An argument is a value passed into a function

For example some of the functions we used so far have arguments

len takes exactly one argument

len(_x_)

int, str and float create a new value from an existing one
print takes zero or more arguments
- With no arguments prints a blank line
- Must use the parethesis to use the function

print('before')
print()
print('after')

Commonly-used built-in functions include `max`, `min` and `round`¶

max()
min()
round()

You can also combine some functions

print(max(1, 2, 3))
print(min('a', 'A', '0'))

But these functions may only work for certain (combination of) arguments.

For example:

max and min must be given at least one argument.
- “Largest of the empty set” is a meaningless question.
And they must be given things that can meaningfully be compared.

print(max(1,4,6,9,100000))
print(min('a', 'A', 0))

100000

----------------------------------------------------------------------
TypeError                            Traceback (most recent call last)
<ipython-input-59-4a37fdd5cb05> in <module>()
      1 print(max(1,4,6,9,100000))
----> 2 print(min('a', 'A', 0))

TypeError: '<' not supported between instances of 'int' and 'str'

Functions may have default values for some arguments¶

round will round off a floating-point number.
By default, rounds to zero decimal places.

round(3.712)

# We can specify the number of decimal places we want. Let's try:

round(3.712, 2 )

3.71

You can use the built-in function `help` to get help for a function.¶

Every built-in has online documentation.

help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number
    
    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.

help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.

Python reports a syntax error when it can’t understand the source of a program.¶

# Forgot to close the quote marks around the string.
name = 'Feng

Try it!

name = 'Feng'
print(name)

Feng

The message indicates a problem on first line of the input (“line 1”).
- In this case the “ipython-input” section of the file name tells us that we are working with input into IPython, the Python interpreter used by the Jupyter Notebook.
The -6- part of the filename indicates that the error occurred in cell 6 of our Notebook.
Next is the problematic line of code, indicating the problem with a ^ pointer.

Python reports a runtime error when something goes wrong while a program is executing.¶

age = 53
remaining = 100 - age # mis-spelled 'age'
print(remaining)

47

Fix syntax errors by reading the source and runtime errors by tracing execution.

Getting help in Jupyter¶

There are 2 ways to get help in a Jupyter Notebook

Place the cursor anywhere in the function invocation (i.e., the function name or its parameters), hold down shift, and press tab.
Or type a function name with a question mark after it.

Every function returns something

Every function call produces some result.
If the function doesn’t have a useful result to return, it usually returns the special value None.

max?

result = print('example')
print('result of print is', result)

example
result of print is None

Excercises¶

What happens when...¶

Explain in simple terms the order of operations in the following program: when does the addition happen, when does the subtraction happen, when is each function called, etc.
What is the final value of radiance?

radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
# The value is ... 2.6
print(radiance)

2.6

Spot the Difference!¶

Predict what each of the print statements in the program below will print.
Does max(len(rich), poor) run or produce an error message? If it runs, does its result make any sense?

easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
print(max(len(rich), len(poor)))

Why not?¶

Why don’t max and min return None when they are given no arguments?

KEY POINTS¶

Use comments to add documentation to programs.
A function may take zero or more arguments.
Commonly-used built-in functions include max, min, and round.
Functions may only work for certain (combinations of) arguments.
Functions may have default values for some arguments.
Use the built-in function help to get help for a function.
The Jupyter Notebook has two ways to get help.
Every function returns something.
Python reports a syntax error when it can’t understand the source of a program.
Python reports a runtime error when something goes wrong while a program is executing.
Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.

Libraries¶

10 min

Most of the power of a programming language is in its libraries.

Library = collection of files (modules) that functions for use by other programs.
The Python standard library is an extensive suite of modules that comes with Python itself.

Other libraries available in PyPI (the Python Package Index).

A program must import a library module before using it.

Use import to load a library module into a program’s memory.
Then refer to things from the module as module_name.thing_name. Python uses . to mean “part of”.

import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))

pi is 3.141592653589793
cos(pi) is -1.0

We can also use help() to learn about the content of a library module, just like we do with functions!

help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.6/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the inverse hyperbolic cosine of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the inverse hyperbolic sine of x.
    
    atan(...)
        atan(x)
        
        Return the arc tangent (measured in radians) of x.
    
    atan2(...)
        atan2(y, x)
        
        Return the arc tangent (measured in radians) of y/x.
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(...)
        atanh(x)
        
        Return the inverse hyperbolic tangent of x.
    
    ceil(...)
        ceil(x)
        
        Return the ceiling of x as an Integral.
        This is the smallest integer >= x.
    
    copysign(...)
        copysign(x, y)
        
        Return a float with the magnitude (absolute value) of x but the sign 
        of y. On platforms that support signed zeros, copysign(1.0, -0.0) 
        returns -1.0.
    
    cos(...)
        cos(x)
        
        Return the cosine of x (measured in radians).
    
    cosh(...)
        cosh(x)
        
        Return the hyperbolic cosine of x.
    
    degrees(...)
        degrees(x)
        
        Convert angle x from radians to degrees.
    
    erf(...)
        erf(x)
        
        Error function at x.
    
    erfc(...)
        erfc(x)
        
        Complementary error function at x.
    
    exp(...)
        exp(x)
        
        Return e raised to the power of x.
    
    expm1(...)
        expm1(x)
        
        Return exp(x)-1.
        This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
    
    fabs(...)
        fabs(x)
        
        Return the absolute value of the float x.
    
    factorial(...)
        factorial(x) -> Integral
        
        Find x!. Raise a ValueError if x is negative or non-integral.
    
    floor(...)
        floor(x)
        
        Return the floor of x as an Integral.
        This is the largest integer <= x.
    
    fmod(...)
        fmod(x, y)
        
        Return fmod(x, y), according to platform C.  x % y may differ.
    
    frexp(...)
        frexp(x)
        
        Return the mantissa and exponent of x, as pair (m, e).
        m is a float and e is an int, such that x = m * 2.**e.
        If x is 0, m and e are both 0.  Else 0.5 <= abs(m) < 1.0.
    
    fsum(...)
        fsum(iterable)
        
        Return an accurate floating point sum of values in the iterable.
        Assumes IEEE-754 floating point arithmetic.
    
    gamma(...)
        gamma(x)
        
        Gamma function at x.
    
    gcd(...)
        gcd(x, y) -> int
        greatest common divisor of x and y
    
    hypot(...)
        hypot(x, y)
        
        Return the Euclidean distance, sqrt(x*x + y*y).
    
    isclose(...)
        isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) -> bool
        
        Determine whether two floating point numbers are close in value.
        
           rel_tol
               maximum difference for being considered "close", relative to the
               magnitude of the input values
            abs_tol
               maximum difference for being considered "close", regardless of the
               magnitude of the input values
        
        Return True if a is close in value to b, and False otherwise.
        
        For the values to be considered close, the difference between them
        must be smaller than at least one of the tolerances.
        
        -inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
        is, NaN is not close to anything, even itself.  inf and -inf are
        only close to themselves.
    
    isfinite(...)
        isfinite(x) -> bool
        
        Return True if x is neither an infinity nor a NaN, and False otherwise.
    
    isinf(...)
        isinf(x) -> bool
        
        Return True if x is a positive or negative infinity, and False otherwise.
    
    isnan(...)
        isnan(x) -> bool
        
        Return True if x is a NaN (not a number), and False otherwise.
    
    ldexp(...)
        ldexp(x, i)
        
        Return x * (2**i).
    
    lgamma(...)
        lgamma(x)
        
        Natural logarithm of absolute value of Gamma function at x.
    
    log(...)
        log(x[, base])
        
        Return the logarithm of x to the given base.
        If the base not specified, returns the natural logarithm (base e) of x.
    
    log10(...)
        log10(x)
        
        Return the base 10 logarithm of x.
    
    log1p(...)
        log1p(x)
        
        Return the natural logarithm of 1+x (base e).
        The result is computed in a way which is accurate for x near zero.
    
    log2(...)
        log2(x)
        
        Return the base 2 logarithm of x.
    
    modf(...)
        modf(x)
        
        Return the fractional and integer parts of x.  Both results carry the sign
        of x and are floats.
    
    pow(...)
        pow(x, y)
        
        Return x**y (x to the power of y).
    
    radians(...)
        radians(x)
        
        Convert angle x from degrees to radians.
    
    sin(...)
        sin(x)
        
        Return the sine of x (measured in radians).
    
    sinh(...)
        sinh(x)
        
        Return the hyperbolic sine of x.
    
    sqrt(...)
        sqrt(x)
        
        Return the square root of x.
    
    tan(...)
        tan(x)
        
        Return the tangent of x (measured in radians).
    
    tanh(...)
        tanh(x)
        
        Return the hyperbolic tangent of x.
    
    trunc(...)
        trunc(x:Real) -> Integral
        
        Truncates x to the nearest Integral toward 0. Uses the __trunc__ magic method.

DATA
    e = 2.718281828459045
    inf = inf
    nan = nan
    pi = 3.141592653589793
    tau = 6.283185307179586

FILE
    /usr/local/lib/python3.6/lib-dynload/math.cpython-36m-x86_64-linux-gnu.so

Import specific items from a library module to shorten programs.

Use from ... import ... to load only specific items from a library module.
Then refer to them directly without library name as prefix.

from math import cos, pi

print('cos(pi) is', cos(pi))

cos(pi) is -1.0

Create an alias for a library module when importing it to shorten programs.

Use import ... as ... to give a library a short alias while importing it.
Then refer to items in the library using that shortened name.

Exercises¶

Exploring the Math Module¶

What function from the math module can you use to calculate a square root without using sqrt?
Since the library contains this function, why does sqrt exist?

Locating the Right Module¶

You want to select a random character from a string:

bases = 'ACTTGCTTGAC'

Which standard library module could help you?
Which function would you select from that module? Are there alternatives?
Try to write a program that uses the function.

from random import randrange
bases = 'ACTTGCTTGAC'
print(bases[randrange(len(bases))])

A

Jigsaw Puzzle (Parson’s Problem)¶

Rearrange the following statements so that a random DNA base is printed and its index in the string. Not all statements may be needed. Feel free to use/add intermediate variables.

bases="ACTTGCTTGAC"
import math
import random
___ = random.randrange(n_bases)
___ = len(bases)
print("random base ", bases[___], "base index", ___)

Importing with aliases¶

Fill in the blanks so that the program below prints 90.0.
Rewrite the program so that it uses import without as.
Which form do you find easier to read?

import math as m
angle = m.degrees(m.pi / 2)
print(angle)

90.0

Importing Specific Items¶

Fill in the blanks so that the program below prints 90.0.
Do you find this version easier to read than preceding ones?
Why wouldn’t programmers always use this form of import?

____ math import ____, ____
angle = degrees(pi / 2)
print(angle)

Reading Error Messages¶

Read the code below and try to identify what the errors are without running it.
Run the code, and read the error message. What type of error is it?

from math import log
log(0)

----------------------------------------------------------------------
ValueError                           Traceback (most recent call last)
<ipython-input-79-d72e1d780bab> in <module>()
      1 from math import log
----> 2 log(0)

ValueError: math domain error

Keypoints¶

Most of the power of a programming language is in its libraries.
A program must import a library module in order to use it.
Use help to learn about the contents of a library module.
Import specific items from a library to shorten programs.
Create an alias for a library when importing it to shorten programs.

Data Frames¶

Reading tabular data into DataFrame¶

(15 min)

Exercises (10 min)

Pandas¶

A widely know library for statistics, particularly on tabular data.

Borrows many features from R´s dataframes.

Dataframe: A 2-dimensional table whose columns have names and potentially have different data types.

#Let's import pandas
import pandas as pd

data = pd.read_csv('/home/mcubero/dataSanJose19/data/gapminder_gdp_oceania.csv')
print(data)

       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
0    Australia     10039.59564     10949.64959     12217.22686   
1  New Zealand     10556.57566     12247.39532     13175.67800   

   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
0     14526.12465     16788.62948     18334.19751     19477.00928   
1     14463.91893     16046.03728     16233.71770     17632.41040   

   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
0     21888.88903     23424.76683     26997.93657     30687.75473   
1     19007.19129     18363.32494     21050.41377     23189.80135   

   gdpPercap_2007  
0     34435.36744  
1     25185.00911

File not found¶

Our lessons store their data files in a data sub-directory, which is why the path to the file is data/gapminder_gdp_oceania.csv. If you forget to include data/, or if you include it but your copy of the file is somewhere else, you will get a runtime error that ends with a line like this:

ERROR
OSError: File b'gapminder_gdp_oceania.csv' does not exist

Use index_col to specify that a column’s values should be used as row headings.¶

Row headings are numbers (0 and 1 in this case).
Really want to index by country.
Pass the name of the column to read_csv as its index_col parameter to do this.

data = pd.read_csv('/home/mcubero/dataSanJose19/data/gapminder_gdp_oceania.csv', index_col='country')
data

Use DataFrame.info to explore a little the dataframe

data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Australia to New Zealand
Data columns (total 12 columns):
gdpPercap_1952    2 non-null float64
gdpPercap_1957    2 non-null float64
gdpPercap_1962    2 non-null float64
gdpPercap_1967    2 non-null float64
gdpPercap_1972    2 non-null float64
gdpPercap_1977    2 non-null float64
gdpPercap_1982    2 non-null float64
gdpPercap_1987    2 non-null float64
gdpPercap_1992    2 non-null float64
gdpPercap_1997    2 non-null float64
gdpPercap_2002    2 non-null float64
gdpPercap_2007    2 non-null float64
dtypes: float64(12)
memory usage: 208.0+ bytes

What we know?

data is a DataFrame
It has two rows named 'Australia' and 'New Zealand'
There are Twelve columns, each of which has two actual 64-bit floating point values.
No null values, aka no missing observations.
Uses 208.0 bytes of memory

The DataFrame.columns variable stores information about the dataframe's columns¶

Note that this is data, not a method.
- Like math.pi.
- So do not use () to try to call it.
Called a member variable, or just member.

data is the dataframe, not a method, don't use () to try to call it.

print(data.columns)

Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
      dtype='object')

Use DataFrame.T to transpose a dataframe

Sometimes want to treat columns as rows and vice versa.
Transpose (written .T) doesn’t copy the data, just changes the program’s view of it.
Like columns, it is a member variable.

print(data.T)
data.T

country           Australia  New Zealand
gdpPercap_1952  10039.59564  10556.57566
gdpPercap_1957  10949.64959  12247.39532
gdpPercap_1962  12217.22686  13175.67800
gdpPercap_1967  14526.12465  14463.91893
gdpPercap_1972  16788.62948  16046.03728
gdpPercap_1977  18334.19751  16233.71770
gdpPercap_1982  19477.00928  17632.41040
gdpPercap_1987  21888.88903  19007.19129
gdpPercap_1992  23424.76683  18363.32494
gdpPercap_1997  26997.93657  21050.41377
gdpPercap_2002  30687.75473  23189.80135
gdpPercap_2007  34435.36744  25185.00911

Use DataFrame.describe to get summary statistics about data¶

DataFrame.describe() gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument include='all'

data.describe()

Not particularly useful with just two records, but very helpful when there are thousands¶

Exercises¶

Read the data in gapminder_gdp_americas.csv (should be in the same directory as gapminder_gdp_oceania.csv)

Tip¶

check the parameters to define the index.

Inspect the data.

Use the function help(americas.head) and help(americas.tail) the answer:

What method callwill display the first three rows of this data?
What method call will display the last three columns of this data?

Reading files in other directories¶

The data for your current project is stored in a file called microbes.csv, which is located in a folder called field_data. You are doing analysis in a notebook called analysis.ipynb in a sibling folder called thesis:

your_home_directory
+-- field_data/
|   +-- microbes.csv
+-- thesis/
    +-- analysis.ipynb

What value(s) should you pass to read_csv to read microbes.csv in analysis.ipynb?

Writing Data¶

As well as the read_csv function for reading data from a file, Pandas provides a to_csv function to write dataframes to files. Applying what you’ve learned about reading from files, write one of your dataframes to a file called processed.csv. You can use help to get information on how to use to_csv.

data2 = data.copy()
#pd.to_csv
#data2.to_csv?
data2.to_csv('/home/mcubero/dataSanJose19/data/processed.csv')

Key Points¶

Use the Pandas library to get basic statistics out of tabular data.
Use index_col to specify that a column’s values should be used as row headings.
Use DataFrame.info to find out more about a dataframe.
The DataFrame.columns variable stores information about the dataframe’s columns.
Use DataFrame.T to transpose a dataframe.
Use DataFrame.describe to get summary statistics about data.

	gdpPercap_1952	gdpPercap_1957	gdpPercap_1962	gdpPercap_1967	gdpPercap_1972	gdpPercap_1977	gdpPercap_1982	gdpPercap_1987	gdpPercap_1992	gdpPercap_1997	gdpPercap_2002	gdpPercap_2007
count	2.000000	2.000000	2.000000	2.000000	2.00000	2.000000	2.000000	2.000000	2.000000	2.000000	2.000000	2.000000
mean	10298.085650	11598.522455	12696.452430	14495.021790	16417.33338	17283.957605	18554.709840	20448.040160	20894.045885	24024.175170	26938.778040	29810.188275
std	365.560078	917.644806	677.727301	43.986086	525.09198	1485.263517	1304.328377	2037.668013	3578.979883	4205.533703	5301.853680	6540.991104
min	10039.595640	10949.649590	12217.226860	14463.918930	16046.03728	16233.717700	17632.410400	19007.191290	18363.324940	21050.413770	23189.801350	25185.009110
25%	10168.840645	11274.086022	12456.839645	14479.470360	16231.68533	16758.837652	18093.560120	19727.615725	19628.685413	22537.294470	25064.289695	27497.598692
50%	10298.085650	11598.522455	12696.452430	14495.021790	16417.33338	17283.957605	18554.709840	20448.040160	20894.045885	24024.175170	26938.778040	29810.188275
75%	10427.330655	11922.958888	12936.065215	14510.573220	16602.98143	17809.077557	19015.859560	21168.464595	22159.406358	25511.055870	28813.266385	32122.777857
max	10556.575660	12247.395320	13175.678000	14526.124650	16788.62948	18334.197510	19477.009280	21888.889030	23424.766830	26997.936570	30687.754730	34435.367440

	gdpPercap_1952	gdpPercap_1957	gdpPercap_1962	gdpPercap_1967	gdpPercap_1972	gdpPercap_1977	gdpPercap_1982	gdpPercap_1987	gdpPercap_1992	gdpPercap_1997	gdpPercap_2002	gdpPercap_2007
country
Australia	10039.59564	10949.64959	12217.22686	14526.12465	16788.62948	18334.19751	19477.00928	21888.88903	23424.76683	26997.93657	30687.75473	34435.36744
New Zealand	10556.57566	12247.39532	13175.67800	14463.91893	16046.03728	16233.71770	17632.41040	19007.19129	18363.32494	21050.41377	23189.80135	25185.00911

Session 1¶

Plotting and programming with Python 3¶

Open a Jupyter Notebook¶

Variables and Assignment¶

Variable names¶

Display values with:¶

Please remember that variables must be created before they are used!¶

Variables can be reused¶

Slicing to get a substring!¶

Final considerations:¶

Excercises¶

KEY POINTS - Variable and Assignment¶

Data Types and Type Conversion¶

Every value has a type.¶

Operations or methods for a given type¶

Excercises¶

Key Points - Data Types and Type Conversion¶

STRETCHING TIME!¶

Numpy¶

Slicing, but in arrays¶

Exercises¶

Built-in Functions and Help¶

Commonly-used built-in functions include max, min and round¶

Functions may have default values for some arguments¶

You can use the built-in function help to get help for a function.¶

Python reports a syntax error when it can’t understand the source of a program.¶

Python reports a runtime error when something goes wrong while a program is executing.¶

Getting help in Jupyter¶

Excercises¶

What happens when...¶

Spot the Difference!¶

Why not?¶

KEY POINTS¶

Libraries¶

Exercises¶

Exploring the Math Module¶

Locating the Right Module¶

Jigsaw Puzzle (Parson’s Problem)¶

Importing with aliases¶

Importing Specific Items¶

Reading Error Messages¶

Keypoints¶

Data Frames¶

Reading tabular data into DataFrame¶

Pandas¶

File not found¶

Use index_col to specify that a column’s values should be used as row headings.¶

The DataFrame.columns variable stores information about the dataframe's columns¶

Use DataFrame.describe to get summary statistics about data¶

Not particularly useful with just two records, but very helpful when there are thousands¶

Exercises¶

Tip¶

Reading files in other directories¶

Writing Data¶

Key Points¶

STRETCHING TIME!¶

Recap¶

Feedback & questions¶

Commonly-used built-in functions include `max`, `min` and `round`¶

You can use the built-in function `help` to get help for a function.¶