Home TechnologyCoding Learn Data Science: How to program in Python

Learn Data Science: How to program in Python

by Ivan

Introduction. Learn Data Science: How to program in Python, Data Scientists Toolkit, Data Science with Python

Learn Data Science: How to program in Python Ivan Ocampo Python for Data Science Learn Python Programming

Workshop 1: Learn Data Science

The aim of these workshops will be to teach you enough Python so that you’ll be able to use machine learning algorithms on various datasets for data analysis. We will be avoiding the more complex maths behind these algorithms leaning instead towards packages like Scikit Learn which already provide us with these algorithms. To begin with we will cover the very basics of Python.

This first workshop is the introduction to Python fundamentals, which acts as an introduction to the programming environment that members can use, along with covering the basics of Python such as Variables, Data types, and Operators. While some of the highlights will be shared here, the full workshop, including the problem sheet, can be found here.

Jupyter Notebook

Throughout these workshops I will be using the Jupyter Notebook editor. There are a lot of online tutorials on how to set this up. I recommend installing Jupyter Notebook through Anaconda – you can do so by following this guide. In this editor code is run in code cells and text is run in markdown cells. You can change the type of cell using at the top of the page below the words Kernel and Widgets. The options to edit cells are found at the top of the page: “+” to create a new cell, scissors to cut a cell etc.

Markdown Cells

The text you are reading now is in a markdown cell. To edit the text double click on the cell. To run the cell press Shift + Enter. These cells don’t actually run code but mainly serve as a means of allowing you to describe what your code is supposed to do, so that you can inform someone else or remind yourself of what your code is doing when you revisit it. These cells also allow for the formatting of equations when you place something in between two single dollar signs. For instance: E=mc2. Doing so between two double dollar signs places the equation on a new line like so:∇⋅B=0.

Don’t worry about what these equations mean but you see what can be done!

In these workshops you will see much of a jupyter notebook taken up by markdown code telling you about what the next code cell is expected to do or what you should be inputting into that cell. This is unique to JupyterNotebook and is the reason why this is often used to introduce people to Python and Data Science in general.

Code Cells

The other type of cell in a jupyter notebook then is a Code Cell and as the name suggests, is where you actually write the code you are going to run. As with a markdown cell, you can make it work by pressing shift + Enter to run the code in that cell. When you run a cell, the output will be displayed below the code cell.

Below is a code cell for you to run, don’t worry about the code in it for now but just run it and see what happens:

print("If you are reading this, this cell has been run!")

Commenting

Inside code cells you can write notes by using the # symbol. Putting it in front of a line of code means that code won’t be run when running the cell. This is useful when you want to see what removing a line of code briefly without deleting the line entirely and having to rewrite it again. Additionally, the # symbol is used in commenting on your code so that another user, or yourself when you return to the code can figure out what that code does. It is good practice to leave comments in your code however they should be consise to avoid overcommenting that may make your code harder to read. If ever in doubt imagine that if you ever undercomment two kittens die and if you ever overcomment one kitten dies. Below is a code cell identical to the one above save for the fact the # symbol is in front of the line of code. If you run the code you will not see any output.

# print("If you are reading this, this cell has been run!")

With that in mind then, we can start to understand some of the usefulness of Python and why we use it. One of the first funcionalities of Python is the print statement we used above. As the name suggests it “prints” the argument you pass to it.

We can first see what happens if we run the original code that we ran without the print statement?

"If you are reading this, this cell has been run!"

We can see that the cell still produces the output as we did with the print statement. However, this changes when we have two lines of results we want to print:

"Line 1"
"Line 2"

From this only the second line was printed. This is because Python, in Jupyter Notebooks, will only show the final line of output unless stated otherwise. The way to get around this therefore is to have each line inside the print statements to ensure that both pieces are outputted:

print("Line 1")
print("Line 2")

Commas in print statements and indeed in all functions are used to seperate different arguments. In print statements you typically do this when “printing” different data types. Passing print("The population of London is", 8.136, "million") will output The population of London is 8.136 million. As can be seen each argument is printed with a space in between them. For this example the arguments in quotation marks are strings; essentially a set of characters, while the second argument is a float; essentially a number. We will discuss data types in more detail later.

print("The population of London is", 8.136, "million")

Now use the print statement in the code cell below to print a message of your choosing.

print()

Python Variables

While so far we have just printed what we wanted in the print statement, Python can also be used to store information in things called variables. Essentially these act as contains to store data variables and this is done using the = operator to assign the data to the specific variable. We can see this by assigning a number and a name to values:

#store the data in variables
x = 10
y = "Peter"

#print the values held in the vraibles
print(x)
print(y)

We can see that we have now stored data in variables x and y and these can be “called” later on to output the required information. This is useful when we may want to change the information latter on, such as by adding one, or by using the stored value mutiple times, such as using the same name over and over again.

We can see this as below:

# Defining a variable x as 5
x = 5
print("x =", x)

# Redefining a variable x as a string "string"
x = "string"
print("x =", x)

# Redefining a variable x as 7
x = 7 
print("x =", x)

# Redefining x as 8 by adding 1 (The same can be done for other mathmatical operations)
x = x + 1
print("x =", x)

# Redefining x as 9 by adding 1 (The same can be done for other mathmatical operations)
x += 1
print("x =", x)

An import part of this is variable naming convention.

A variable can have a short name, such as x and y that we have already done, or a more descrptive name such as first_name, last_name or car_age. However there are rules for this:

  • A variable name must start with a letter or the underscore character
  • A variable name cannot start with a number
  • A varibale name can only contain alpha-numeric characteristics and underscores (Az, 0-9, and )
  • Varibale names are case sensitive

Examples that we can use are:

myvar = "Ivan"
my_var = "Ivan"
_my_var = "Ivan"
myVar = "Ivan"
MYVAR = "Ivan"
myvar2 = "Ivan"

Examples that would produce an error would include:

2myvar = "Ivan"
my-var = "Ivan"
my var = "Ivan"

As we get to more complicated names however that contain multiple names, we can use different conventions to make then easier to read:

#Camel Case
#each word, except the first, starts with a capital letter
firstName = "Ivan"

#Pascal Case
#each word starts with a capital letter
FirstName = "Ivan"

#Snake case
#each word is seperated by an underscore character
first_name = "Ivan"

Finally, we can also assign multiple values to multiple variables in one line:

x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
print(z)

Can you use this to assign your first name, last name, age and course to four apropriate named variables and print out the results?In [ ]:

print()

Data Types

We can see above that we have defined some variables as numbers without using speech marks, while we have defined words with speech marks. This is important as it helps us to distinguish between different datatypes, which in programming is a very important concept.

Variables are used to store information, as we have already seen, but the type of that data is important for what you can do with that specific varable i.e. addition, subtraction, multiplication or list slicing. By default, Python has the following built in datatypes:

  • Text type: str
  • Numeric Types: int, float, complex
  • Sequence Types: list, tuple, range
  • Mapping Type: dict
  • Set Types: set, frozenset
  • Boolean Type: bool
  • Binary Types: bytes, bytearray, memoryview

For our purposes here we will cover str, int, float and boolean but other datatypes are coevred later on.

Firstly, we can define variables of each type and then we get the datatype of that object to see what it is:

a = 2
b = 2.0
c = "Hello World"
d = True
print(type(a))
print(type(b))
print(type(c))
print(type(d))

From this we can see that a is an int, b is a float, c is a str and d is bool.

What this means is that:

  • int is an integer value i.e. with no decimal place
  • float is a numerial value with decimal places
  • str is a string value
  • bool can only take on True or False

To make this simple and to reduce ambiguity you can set the datatype itself:

a = int(2)
b = float(2)
c = str("Hello World")
d = bool(True)
print(type(a))
print(type(b))
print(type(c))
print(type(d))

But we can also use this method (which is known as casting) to change the datatype of a specific value. For example:

a = 2
print(type(a))
b = float(a)
print(type(b))
a = "2"
print(type(a))
b = float(a)
print(type(b))
a = True
print(type(a))
b = str(a)
print(type(b))

Of course, in order to cast a variable we need to ensure that it is compatible otherwise an error with be thrown.

Can you cast the following float to an integer and string and print out the result?In [ ]:

a = 27.0
print(a)
b = 
print()
c = 

What happens when you try to case a to a bool?In [ ]:

d = 
print()

Basic Operations

As part of Python’s basic functionality we also have basic operations that can be perfomed for example addition. Python divides operators into the following groups:

  • Arithmetic operators
  • Assignment operators
  • Comparison operators
  • Logical operators
  • Identity operators
  • Membership operators
  • Bitwise operators

so far we have already seen assignment operators in effect, by assiging a value to a variable. For now we will focus on Arithmateic and Comparison operators, whereby other operators will be introduced in future lecture series.

For this, basic arithemtic operators include:

  • + for addition
  • - for subtraction
  • * for multiplication
  • / for division
  • % for modulos
  • ** for exponentiation
  • // for floor division

We can see how these perform below:

# Addition
print("Addition:", 2 + 2)

# Subtraction 
print("Subtraction:", 5 - 2)

# Multiplication
print("Multiplication:", 2*4)

# Division
print("Division:", 6/3)

# Powers
print("Powers:", 5**3)

# Division without remainder 
print("Divison without remainder:", 7//3)

# Returns remainder
print("Division returning the remainder:", 7%3)

Be aware that is often easy to make syntax errors when performing mathmatical operations. Sometimes this involves writing something to the effect of 2x instead of 2*x where the former would look for a variable defined as 2x and the latter would multiply a variable x by 2. In large calculations when using brackets it is very easy to make mistakes so be alert for potential mistakes. Clarity of code is important to help make it easier to read and spot mistakes. Often rather than having an entire calculation in one messy line of code its often best to split the calculation into several lines that are easier to read.

In the code cell below print the result of the following operations:478×123452384

# 4 to the power of 7
print()

# 8 x 123
print()

# 4523 divided by 84
print()

Then we also have comparison operators that are used to compare values. These include:

  • == for Equal
  • != for Not Equal
  • < for Less than
  • > for Greater Than
  • >= for Greater than or equal to
  • <= for Less than or equal to

Which can be tested as follows:

#for equal
print("5 is equal to 5:", 5 == 5)

#for not equal
print("5 is not equal to 4:", 5 != 4)

#for less than
print("3 is less than 5:", 3 < 5)

#for greater than
print("5 is greater than 3:", 5 > 3)

#for greater than or equal to
print("5 is greater than or equal to 3:", 5 >= 3)

#for less than or equal to
print("3 is less than or equal to 5:", 3 <= 5)

Can you see what has happened here and why they have behaved this way?

Can you then change these so instead of True they all return False?

#for equal
print("5 is equal to 5:", 5 == 5)

#for not equal
print("5 is not equal to 4:", 5 != 4)

#for less than
print("3 is less than 5:", 3 < 5)

#for greater than
print("5 is greater than 3:", 5 > 3)

#for greater than or equal to
print("5 is greater than or equal to 3:", 5 >= 3)

#for less than or equal to
print("3 is less than or equal to 5:", 3 <= 5)

Of course, each of these operations and what can be done with them will depend on the data type you are working with. While addition may work well for floats and intergers for example, it can behave very differently with strings.

For example, is the following output what you would expect?In [ ]:

#additions
print("Hello" + "World")

#multiplication
print("Hello world " * 3)

#comparison
print("A" == "A")
print("A" == "a")

#less than
print("A" < "a")

#greater than
print("c" > "b")

Further Work (Optional)

At the end of the workshops I may include content that I feel is too maths oriented for the workshops. The idea here is that you are free to explore these but they will not be covered in workshops.

Complex Numbers

In addition to the float and integer data types there is also a complex number data type. The function complex(u,v) generates a complex number with a real component u and imaginary component v. Note that j is used as the imaginary unit in Python.

# Creates a complex number z = 2 + 4j
z = complex(2,4)
print(z)

# Alternatively 
z = 2 + 4j
print(z)

# Real part of z
z_real = z.real
print(z_real)

# Imaginary part of z
z_imag = z.imag
print(z_imag)

# Returns the conjugate of the complex number
z_conj = z.conjugate()
print(z_conj)

You may also like

Leave a Comment