
Learn Data Science: How to workshop. Object Oriented Programming in Python
Workshop 4: What is OOP, defining a class, adding attributes, adding methods, and class inheritance in Python
Prerequisite:
Knowledge of
basic Python syntax.Prerequisite:
Knowledge of
Python data structure.Prerequisite:
Please familiarise yourself with previous workshops on Learning Data ScienceKnowledge of
Python logic, loops and functions.
This fourth class is an introduction to Object Oriented Programming, in which we introduce to you how to define a class, adding attributes, adding methods and class inheritance.
Object Oriented Programming
Firstly, what is Object Oriented Programming? Primarily, this is a way of structuring your code in a way that both characteristics and behaviours data data can be bundled together into a single structure. This structure allows you to use the class as a blueprint to create multiple objects from, following one of the main coding principles of Don’t Repeat Yourself. This allows you to create objects throughout your code, allowing you to access the same information or functions at different points throughout your workflow.
This is contrast to procedural programming that is often used within the Data Science community, whereby code follows a sequence of steps in order to complete a task using functions and code blocks. This can be seen in the previous workshops whereby we used code blocks in jupyter notebooks to work through a sequence bit by bit.
The benefit however of Object Oriented Programming is that you can store data and associated actions that you want to perform across multiple different entries in a simple an easy way, while allowing you to create a blue print tat can be used again and again. This is useful when you know you have certain characteristics of each instance and you may want to perform the same functions on this data over and over again. An example of this may be for storing data on employees whereby they each have a wage, a level experience or grade and common actions include a work anniversary or a promotion.
1) What is obejct oriented programming
Object oriented Programming is a way of structuring your code in a way that both characteristics and behaviours of data can be bundled together into a single structure. This single structure then allows you to use the class or object your created again and again throughout your code, following the main coding principle of Don’t Repeat Yourself (DRY). This allows you to create these objects throughout your code, allowing you access this information or perform certain functions after they have been created.
This is in contrast to procedural programming which we have done so far, whereby you code follows a sequence of steps in order to complete a task using functions and code blocks which is the tradition in Data Science. You can see that here, and in previous workshops, as we use code blocks in our Jupyter notebook, and we have also learnt how to create functions as well.
The advantage of object oriented programming however is that this can allow you to store data and associated actions that you want to save and/or perform across multiple different entires while allowing you to use the base definition/blueprint again and again. This is useful when you know you have data that is structured in a way that you will store it with the same attributes, or there are overlapping attributes, and where you would want to have the same functionality associated with it. Most libraries that you will use throughout this workshop series will use some form of object oriented programming and you can take a look at their github repositories when you start using them to see how they organise their code.
For our purposes, an example of where object oriented programming may be useful is thecase of a firm where you have multiple employees and you want to store their work information (of course with their permission). For each employee you would want to store information such as: current wage, years worked for the firm, current grade, all of which are associated with the company and how they manage their employees, while you may also want to store other information such as their age or their birthday. With this information, you may also want to add some degree of functionallity when an employee does something or where something happens. For example, you may want to give them a promotion that will increase their grade and also their pay, you may want to pay them a bonus, or they may learn a new skill such as new language. By creating an Employee class, you can store all this data and perform these functions with every employee in your firm, but you can reuse the same blueprint over and over again to ensure you have the same functionality.
The way this is done is by creating classes which are used as blueprints for objects. The class descrived overall what the object will but, but is seperate from the object itself which is a specific instance.
Our first task then is to create a class:
2) Defining a class
The first thing to do for this is to define a class. This is a structure that describes essentially what the object will be and acts as a blueprint for creating specific objects in the future. As we’ve already emphasised, this can be used again and again to create multiple different objects that take the same structure, have the same attributes and perform the same functions.
These are created using the keyword class and are followed by an indented block which contains methods (which are essentially the same as functions). An example of this would be as follows:In [ ]:
class Employee:
pass
Here we have defined an employee class which takes pass as the only attribute currently. This means we have created an empty class which currently has no functionality embedded in it. This is important as it is the only way to create an empty class as you can’t leave the code black, otherwise you will get an error.
It is important to note here that to create a new class we have used the form class <name>
where <name>
has taken the name of the class that we are creating. In this case, this is the employee class. Naming these classes takes on a certain convention known as CamelCase which essentially means that instead of _
seperating words (as in the case of snake case), the beginning of words are capitalised like we have with Employee here.
The next thing to note is that everything in the indented class has been specified will be part of the class. Since this is an empty class then nothing will be assigned to an instance of the class but we can run the following code to see what we can create an object of the class Employee:In [ ]:
juliet = Employee()
We can check that Steve is an employee by using the following:In [ ]:
juliet.__class__.__name__
Here, the .__class__
is able to check the type of class, while .__name__
is able used to limit this to just printing out the name of the class. In this case it is Employee
.
We have thus been able to create our first class! Yay! Of course, we don’t have any functionality as of yet so it might be good to start understanding how we can make this actually useful to us:
3) Adding attributes
The next step to creating a class is to then start adding attributes. This is done using the __init__()
method which is called when an instance (object) of the class is created. Essentially this attaches attributes to any new object created of that class.
For our purposes, since we have an employee we want to assign a wage, a grade and the number of years worked:In [ ]:
class Employee:
def __init__(self, wage, grade, years_worked):
self.wage = wage
self.grade = grade
self.exp = years_worked
juliet = Employee(30_000, 1, 1)
Now we can see that the code ran smoothly above and that juliet is created with a wage of £30000, a grade of 1 and an experience of 2 years. We expect that all employees will have this type of associated information and so we can be confident that out blueprint will be useful for any employee that we create in the future.
In creating this class however, it must be noted that all methods that are part of classes must have the selfargument as their first paramater, even though it isn’t explicitly passed in the code. In this case, we didn’t necessarily have to specify any self attribute when create juliet, but we use it to assign attributes to the class.
We can thus see how we have added attributes to our class, and we have added these to a specific instance of the class. What about accessing them? We can access this using dot notation, meaning that we can put .
and the attribute name after the instance to be able to access this information. In this case, for juliet, we can access her wage, grade and experience in the following ways:In [ ]:
print("Juliet's wage is:", juliet.wage)
print("Juliet's grade is:", juliet.grade)
print("Juliet has worked for", juliet.exp, "years")
As already mentioned, one of the benefits of using object oriented programming is that we can reuse this class as a blueprint to create new instances of the same class. As such, using the same way we created the instance of juliet, can you create two other Employees of Emily and Alice with the following attributes:
Emily – Wage: £40,000, grade: 5, years worked: 5
Alice – Wage: £50,000, grade: 7, years worked: 10In [ ]:
Now, again, using dot notation, can you access:
- Alice’s wage
- Emily’s grade
- Alice’s years worked
In [ ]:
Now imagine you had thousands of employees for which you had the same information, we have the blueprint in place that will allow us to simply create hundreds of objects to store this information.
What about however if some information was missing? For example we forgot to write out someone’s years worked or their grade? In this case, if we try to create an instance of the class without that information then we will get an error as follows:In [ ]:
williams = Employee(40_000, 2)
We can see here that we have a type error whereby it is telling us that we are missing a positional argument of years_worked
. One way around this, as when we create functions, we can specify default values that a new instance of the object will have that default value if we do not specify that information. For example, if we take a new employee/graduate we could set their basic wage as £20,000, grade as 1 and years worked for the company as 0:In [ ]:
#create the employee class
class Employee:
#add the init function
#but now set default values
def __init__(self, wage = 20_000, grade=1, years_worked=0):
self.wage = wage
self.grade = grade
self.exp = years_worked
Now we can create a new Employee without having to specify any characteristics at all that will have this basic information:In [ ]:
#create a new employee called william
william = Employee()
#print the information
print("Williams's wage is:", william.wage)
print("Williams's grade is:", william.grade)
print("William has worked for:", william.exp, "years")
This means that we can thus specify only certain attributes when creating a new instance of the class, without worring about any errors being thrown:In [ ]:
william = Employee(grade=2)
print("Williams's grade is:", william.grade)
In [ ]:
elizabeth = Employee(years_worked = 1, grade = 1)
print("Elizabeths's grade is:", elizabeth.grade)
Attributes created using the __init__()
method are instance attributes as their value are specific to a particular instance of the class. Here, all Employees have wage, grade and years worked but each of these will be specific to the instance of the class created and hence specific to that employee.
On the other hand, there is also class attributes which have the same value for all class instances. These attributes are assigned prior to the __init__()
method. For our purposes we can assign a company that the Employee is working for, knowing that all our employees will be working for the same company:In [ ]:
class Employee:
#class attribute
company = "Data Sci"
#instance attributes
def __init__(self, wage = 20_000, grade=1, years_worked=0):
self.wage = wage
self.grade = grade
self.exp = years_worked
These class attributes can be accessed the same way as instance attributes, but they will be the same for all instances of the class created. For example:In [ ]:
julie = Employee(35000, 3, 3)
print(julie.company)
In [ ]:
peter = Employee(30000, 5, 5)
print(peter.company)
What about changing information of an employee?
Say, for example, that we want to promote Julie with an associated increase in wages and going up a grade. In the same way we can access information we can also update the information using dot notation. For example, for julie we wanted to add £5,000 to her wage and her grade to go up 1. We can do that as we would for any normal information as follows:In [ ]:
#print out the current information
print(f"Julie's current wage is: {julie.wage}")
print(f"Julie's current grade is: {julie.grade}")
print("n")
julie.wage += 5000
julie.grade += 1
#print out the new information
print(f"Julie's new wage is: {julie.wage}")
print(f"Julie's new grade is: {julie.grade}")
We can see that we have now been able to update Julie’s wage and grade, just like we would do with any other variable.
Using the examples of Emily and Alice you created before, can you increase Emily’s wage by £10,000 (a nice promotion), and her grade by 2, while increasing Alice’s years worked for the company by 1?In [ ]:
When we think of this however, we know that promotions and increases in the years worked for the company will be routine examples whereby if we gave wage increases and grade increases this way then it would seem rather inefficient. Thus we can take a look at adding methods that allow us to perform certain actions with the informtion contained in the object.
4) Adding methods
Now that we have introduced how to specify class and instance attributes, the next stage is to add methods.
The __init__()
method is one method which we have already been introduced to, which in this case is used to assign attributes when we first define an instance of the class. We can then start to define our own methods that perform certain actions with our objects, such as changing their characteristics or making them perform certain behaviours. These methods are defined in the same way as functions are, however since this is part of class we need to make sure that the self
argument is passed.
In the case of our Employees we can create a method by which we can give them a promotion, which comes with a wage increase of the standard 10% increase and grade increase of 1.In [ ]:
#create the employee class
class Employee:
#add the init constructor in the same way we have done already
def __init__(self, wage = 20_000, grade=1, years_worked=0):
self.wage = wage
self.grade = grade
self.exp = years_worked
#add a promotion method
def promotion(self):
self.wage += 0.1 * self.wage
self.grade += 1
Here in the promotion method we have called the self.wage attribute, and as can be remembered from previous Introduction to Python the +=
takes the original value and adds the specified value, and added 10% of the original wage. For the self.grade attribute we have now added a 1 value to simply say that the grade has increased by 1, inline with a typical promotion.
The result of this can be shown by generating a examining an employee called William and giving him a promotion. We can check the difference by seeing his wage and grade prior, and his wage and grade after.In [ ]:
#creae the employee
william = Employee()
#Checking the original objects attributes
print("Williams's wage is:", william.wage)
print("Williams's grade is:", william.grade)
print("William has worked for", william.exp, "yearsn")
#Giving William and promotion
print("William has got a promotionn")
william.promotion()
#Checking to see that the grade and wage have changed
print("Williams's wage is now:", william.wage)
print("Williams's grade is now:", william.grade)
We can see now that instead of two lines of code and having to specify the increase for both the wage and the grade, we can just use one line of coding, making our lives much easier. The only issue is, is that we have specific that all promotions would be associated with the same 10% increase and the same 1 grade increase. We could thus add the ability to change this but adding parameters that would allow this to change, while keeping these values as the typical increase:In [ ]:
#create the employee class
class Employee:
#add the init constructor in the same way we have done already
def __init__(self, wage = 20_000, grade=1, years_worked=0):
self.wage = wage
self.grade = grade
self.exp = years_worked
#add a promotion method
#with assumed values
def promotion(self, wage_increase = 10, grade_increase = 1):
self.wage += wage_increase/100 * self.wage
self.grade += grade_increase
Thus, we can now try to gave wage and promotion increases two different employees who are on different promotion tracks:In [ ]:
#create our employee information
sheila = Employee(90_000, 10, 15)
james = Employee(35_000, 2, 1)
#set up the people to give promotions to
promotions = [sheila, james]
names = ["Sheila", "James"]
#print the before information
for idx, people in enumerate(promotions):
print(f"{names[idx]}'s current wage is {people.wage}")
print(f"{names[idx]}'s current grade is {people.grade}")
print("n")
#give them promotions
sheila.promotion(wage_increase = 20, grade_increase = 3)
james.promotion(wage_increase = 12, grade_increase = 1)
#check their new wage and grade
for idx, people in enumerate(promotions):
print(f"{names[idx]}'s new wage is {people.wage}")
print(f"{names[idx]}'s new grade is {people.grade}")
print("n")
The other change that we may also want to implement is an increase in their years worked for the company on their anniversary. Can you figure out how to add an anniversary method that adds a year to an employees work experience called anniversary
:In [ ]:
#create the employee class
class Employee:
#add the init constructor in the same way we have done already
def __init__(self, wage = 20_000, grade=1, years_worked=0):
self.wage = wage
self.grade = grade
self.exp = years_worked
#add a promotion method
#with assumed values
def promotion(self, wage_increase = 10, grade_increase = 1):
self.wage += wage_increase/100 * self.wage
self.grade += grade_increase
#create an anniversay message
def anniversary(self):
(??)
Check to see this has worked by creating a new object of Sabrina with wage: £200,000, grade: 15, years experience: 25 and giving her a work anniverary:In [ ]:
Is there any other functionality that you can think off for an employee or any other information you may want to add?
Play around by creating your own Employee class with this information and methods:
5) Class Inheritence
The good thing about using classes in your code is that class can inherit attributes and methods from other classes. This is useful in many parts of programming and allows you to build on already existing functionality. An example of this would be that Geopandas (A library for manipulating geographical datasets) essentially inherits from Pandas, which allows Geopandas to use all the methods and attributes that Pandas has. This just means that it can build on Pandas without have reimplement itself all that pandas does, and that any changes to pandas can be integrated into future GeoPandas builds as well.
Essentially, this allows you to define a new class that gets all the functionality of the old class, but you can add extra functionality without copying the code from the previous class. The way this is done is through the fllowing notation:
class MyChild(MyParent): pass
Here MyParent is the class whose functionality is being extended/inherited, while MyChild is the class that will inherit the functionality. While here pass
is used, you can do the same as before in terms of adding attributes and methods.
An example of this is specifying different types of Employees, for which considering we are working for ‘Data Sci’ company we can add Data Scientists:In [ ]:
#ctreate the Data Scientist class inheriting from the Employee class
class DataScientist(Employee):
#add no new functionality for now
pass
#create an instance of jessicae
jessica = DataScientist(70_000, 6, 10)
#print Jessica's wage
print(f"Jessica's current wage is: {jessica.wage}")
#give her a promotion
jessica.promotion()
print(f"Jessica's new wage is: {jessica.wage}")
We can see the usefulness of this here in that we already have all the funcionality of the parent without having to write anything new! This is so useful for creating new child classes as it saves us coding more.
However, an important part of creating a child class is being able to extend either the attributes or the methods of the parent class that objects of the parent class would not have. In our case, we can assume that Data Scientists can have programming languages that the typical employee would not need to know about. Thus we can add a programming languages as an attribute that we can edit for the Data scientist:In [ ]:
#create the Data Scientist class inheriting from the employee class
class DataScientist(Employee):
#child's initialisation
def __init__(self, wage = 20_000,
grade=1, years_worked=0,
p_languages = []):
#use the parents initialisation
Employee.__init__(self, wage,
grade, years_worked)
#new characteristics to add
self.languages = p_languages
#Create a new DataScientist
Jessica = DataScientist(70000, 6, 12, ["Python", "R", "SQL"])
#Output her languages
Jessica.languages
The thing to note here is that when you add the __init__()
method to the child class, the child class will no longer inherit the parent’s __init__()
function.
In this case, to keep the parent’s __init__()
function and the attributes associated with it we have added call to the Employee Initialisation method seen on the second line of the __init__()
method – Employee.__init__(self, wage, grade, years_worked)
and hence retained all that comes with this. We have thus been able to extend this then by simply adding a new attribute, just as we would before.
Once we’ve done that, then we can start to build up our subclass just like we did for our parent class by adding methods as well. For our purposes, we could send our Data Scientist off to learn a new language, potentially for a new project or just for personal development, so we can add the new language to their list and also give them a promotion potentially:In [ ]:
#create the Data Scientist class inheriting from the employee class
class DataScientist(Employee):
#child's initialisation
def __init__(self, wage = 20_000,
grade=1, years_worked=0,
p_languages = []):
#use the parents initialisation
Employee.__init__(self, wage,
grade, years_worked)
#new characteristics to add
self.languages = p_languages
#add language learning functionality
def learn_lang(self, new_lang, promotion = False, wage_increase = 10):
#add the new language
self.languages.append(new_lang)
#if promotion is true
if promotion == True:
#add wage increase
self.wage += wage_increase/100 * self.wage
Once this is implemented we can then create a new DataScientist with the same langauges and see that she has learn a new language while working for the company.In [ ]:
#create new DataScientist
juliet = DataScientist(80000, 7, 15, ["Python", "R", "SQL"])
#Print her current languages and wage
print(f"Juliet's current languages are {juliet.languages}")
print(f"Juliet's current wage is £{juliet.wage}")
#She learns a language
#so we give her a promotion
juliet.learn_lang("JavaScript", promotion = True)
#check what languages she now knows
print(f"Juliet's new languages are {juliet.languages}")
print(f"Juliet's new wage is £{juliet.wage}")
And we can also call the same characteristics and and functions from the parent class in the child class.
Why not try to give Juliet a work anniversary just as we did before:
This is especially useful for making subclasses that take information from the parent class. You can even try making other workers such as a software engineer who has characteristics such as programming languages and operating system they work with. Feel free to have a go adding any other child to Employee you can think of:
It is worth noting that when it comes to designing class inheritence there is what is known as the Liskov Substitution Principle which says that the base class should be interchangeable with any of its subclasses without altering any proprties of the program. The rule that follows is that if the hierarchy of classes violates the Liskov Substitution PRinciple then you should not be using inheritence as it is likely to make the code behave in unpredictable ways further down the line. More information can be found here and here