From Python dictionary: DataFrame(data)
(Python dictionaries are not arrays of objects)
From CSV: read_csv('data.csv')
From JSON: read_json('data.json')
The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type.
Can be accessed or modified using: df.index()
df.columns.values
df.rename(columns={'col0':'new col0','col1':'new col1','col2':'new col2'}, inplace=True)
df['merged'] = pd.concat([df['col1'], df['col2']])
info()
to display the structure (columns...)
head()
to print the first 5 rows
tail()
to print the last 5 rows
Returns an array of unique values in a columns
df['column_name'].unique()
df['column_name'].nunique() # Just the number of unique values
Count Rows inside a Group
df['column_name'].value_counts()
df.drop_duplicates()
Remove empty cells
df.dropna( inplace=False, how='any', subset=['column_name'], )
Filling NaN values using the previous and next values:
df[column].interpolate()
df['column_name'].str.replace('search','replaceby')
Columns:
newdf = df.drop("age", axis='columns')
Rows:
newdf = df.drop(99, axis='index')
Also takes lists as input
To numeric:
pd.to_numeric(df['col_name'], errors='coerce')
Advantage: Can handle non-numeric values
astype:
df.astype({'col_name': 'float'},errors='ignore').dtypes
Returns unchanged series (or error) if non-numeric values are present
Exercises https://github.com/guipsamora/pandas_exercises
Introduction to Pandas Data Analysis https://www.youtube.com/watch?v=DkjCaAMBGWM
Exploratory Data Analysis https://www.youtube.com/watch?v=xi0vhXFPegw