A dataframe is a data structure constructed with rows and columns, similar to a database or Excel spreadsheet. It consists of a dictionary of lists in which the list each have their own identifiers or keys, such as “last name” or “food group.”
To create a dataframe you must first create a dictionary. A dictionary is a list of values linked to keys. The keys are separated from their values with colons and brackets as shown below. In this case, the dictionary keys will become the column names for the DataFrame. The key would be “Grades” and the values would be “A, B, C, D, F”.
These are the dictionary methods and what they do. We won’t go into too much detail on dictionaries but they may become important in the future if you’re working with data structures and algorithms.
Method |
Usage |
Values() |
Return a list of all values in the dictionary |
Update() |
Updates the dictionary with the specified key-value pairs |
setdefault() |
Returns the value of the specified key. If the key does not exist insert the key, with the specified value |
clear() |
Removes all the elements from the dictionary |
keys() |
Returns a list containing the keys of the dictionary |
pop() |
Removes the element with the specified key |
popitem() |
Removes the last inserted key-value pair |
get() |
Returns the value of the specified key |
items() |
Returns a list containing a tuple for each key value pair |
copy() |
Returns a copy of the dictionary |
fromkeys() |
Returns a dictionary with the specified keys and value |
To begin we enter a dictionary list into the DataFrame() parameters.
DataFrames will automatically be indexed 0 to n, with n being the number of values in the dictionary. We can override this indexing by using the “index = “ parameter after our dictionary in order to manually set what the row headers for our data will be.
Note: Most times you won’t specify an index and pandas will create one automatically.
Now some useful commands for dealing with pandas dataframes
When you want to see the top of a data frame the .head() method will allow you starting from the first indexed row the first 4 rows. The tail() method will do the same but starting from the last indexed row.
Because when creating the data frame we specified an index when we want to select certain columns it will also show up. In this example we just want the gpa column.
From this output you see that it gives us our student index and their gpa’s. To get one column the syntax is df[‘<column>’] for multiple columns you’ll have to use a list therefore it would look like df[[‘<column>’, ‘<column>’]]. Two brackets are required.
Selecting parts of columns and rows instead of all of the values is a tad more complex. Let’s start by only selecting the first three out of four student records. You do so using the following syntax:
Choosing the middle columns:
Remember, indexing starts at 0.
The way indexing works is it doesn’t include the last indexed number. In the first subset df[:3] this means up to and not including the 4th row(index starts at 0).
To get a single value from a dataframe the syntax is: df.<column>[<index number>]
This gives us the gpa of the third student.
501 E. High Street
Oxford, OH 45056
1601 University Blvd.
Hamilton, OH 45011
4200 N. University Blvd.
Middletown, OH 45042
7847 VOA Park Dr.
(Corner of VOA Park Dr. and Cox Rd.)
West Chester, OH 45069
Chateau de Differdange
1, Impasse du Chateau, L-4524 Differdange
Grand Duchy of Luxembourg
217-222 MacMillan Hall
501 E. Spring St.
Oxford, OH 45056, USA