Pandas Dataframes | Python | CADS

Topic 2: Creating a Dataframe

To create a dataframe you must first create a dictionary. A dictionary is a list of values linked to keys. The keys are separated from their values with colons and brackets as shown below. In this case, the dictionary keys will become the column names for the DataFrame. The key would be “Grades” and the values would be “A, B, C, D, F”.

These are the dictionary methods and what they do. We won’t go into too much detail on dictionaries but they may become important in the future if you’re working with data structures and algorithms.

Method	Usage
Values()	Return a list of all values in the dictionary
Update()	Updates the dictionary with the specified key-value pairs
setdefault()	Returns the value of the specified key. If the key does not exist insert the key, with the specified value
clear()	Removes all the elements from the dictionary
keys()	Returns a list containing the keys of the dictionary
pop()	Removes the element with the specified key
popitem()	Removes the last inserted key-value pair
get()	Returns the value of the specified key
items()	Returns a list containing a tuple for each key value pair
copy()	Returns a copy of the dictionary
fromkeys()	Returns a dictionary with the specified keys and value

To begin we enter a dictionary list into the DataFrame() parameters.

DataFrames will automatically be indexed 0 to n, with n being the number of values in the dictionary. We can override this indexing by using the “index = “ parameter after our dictionary in order to manually set what the row headers for our data will be.

Note: Most times you won’t specify an index and pandas will create one automatically.

Topic 3: Looking at the Data

Now some useful commands for dealing with pandas dataframes

When you want to see the top of a data frame the .head() method will allow you starting from the first indexed row the first 4 rows. The tail() method will do the same but starting from the last indexed row.

Because when creating the data frame we specified an index when we want to select certain columns it will also show up. In this example we just want the gpa column.

From this output you see that it gives us our student index and their gpa’s. To get one column the syntax is df[‘<column>’] for multiple columns you’ll have to use a list therefore it would look like df[[‘<column>’, ‘<column>’]]. Two brackets are required.

Selecting parts of columns and rows instead of all of the values is a tad more complex. Let’s start by only selecting the first three out of four student records. You do so using the following syntax:

Choosing the middle columns:

Remember, indexing starts at 0.

The way indexing works is it doesn’t include the last indexed number. In the first subset df[:3] this means up to and not including the 4th row(index starts at 0).

To get a single value from a dataframe the syntax is: df.<column>[<index number>]

This gives us the gpa of the third student.

Center for Analytics and Data Science

165 McVey Data Science Building

105 Tallawanda Rd

Oxford, OH 45056

513-529-2279 cads@��OH.edu

��

Part 4: Pandas Dataframes

Topic 1: What is a Dataframe

Topic 2: Creating a Dataframe

Topic 3: Looking at the Data

Got it Down? Click here for Part 5!

Center for Analytics and Data Science

Contact Us

Initiatives

Find

��������

Part 4: Pandas Dataframes

Topic 1: What is a Dataframe

Topic 2: Creating a Dataframe

Topic 3: Looking at the Data

Got it Down? Click here for Part 5!

Center for Analytics and Data Science

Contact Us

Follow Us

Initiatives

Find

Follow Us

��