When working with data in python it’s good practice to change dates to datetime and sometimes a column of values from strings. To see the data types: print(<df>.dtypes)
To change data types: <df>[<column>] = <df>[<column>].astype(‘<datatype>’)
Concatenating two dataframes in python is essentially forcing the two sets to combine either vertically or horizontally. If the datasets have the same column headers, then it will group the data from each set into its respective column. Below we combine the records for four students.
If you do not specify how you would like to concatenate the data, python will stack the datasets horizontally by default.
However, if you would like the data to be grouped side-by-side, then specifying “axis = 1” in the concatenation function will simply put the second dataset to the right of the first.
Using the documentation you can see all of the different methods available to use when concatenating like sorting and concatenating on certain axes.
Suppose you’re given a dataset such as the ones below where series list is a full record, unlike the previous datasets that had a different series for each variable which we then combined them on.
The join method combines data sets that share the same indexes (row headers). In the example below, both data sets have the same four indexes so they are able to be combined into one set, resulting in vertically-oriented student records.
501 E. High Street
Oxford, OH 45056
1601 University Blvd.
Hamilton, OH 45011
4200 N. University Blvd.
Middletown, OH 45042
7847 VOA Park Dr.
(Corner of VOA Park Dr. and Cox Rd.)
West Chester, OH 45069
Chateau de Differdange
1, Impasse du Chateau, L-4524 Differdange
Grand Duchy of Luxembourg
217-222 MacMillan Hall
501 E. Spring St.
Oxford, OH 45056, USA