get the first row from a dataframe:
df_head = df.iloc[0]
get the shape/size information:
df.shape
get data types of columns:
df_cat9.dtypes
drop the first row:
df = df.drop(0)
drop columns:
X_df = X_df.drop(columns=['Cluster'])
get the unique values of a columns:
df_2012Prod.UNIT_DESC.unique()
Drop duplicates:
df_2012Prod_weight_clean_unique = df_2012Prod_weight_clean.drop_duplicates(['COMMODITY_DESC', 'CLASS_DESC'])
rename column names
Y_df.columns=[‘Cluster’,’SpeechNumber’]
df_2015_1 = df_2015.rename(columns={ “Attribute”:”Definition”, “Column Tag”: “2015” })
reorder columns:
df_2015_1 = df_2015_1[[‘Definition’, ‘2015’]]
remove rows if…:
df_2012Prod_weight = df_2012Prod[~df_2012Prod["UNIT_DESC"].str.contains("\$")]
Convert strings to lower cases:
df_2012Prod_weight_clean_unique['COMMODITY_DESC_low'] = df_2012Prod_weight_clean_unique['COMMODITY_DESC'].str.lower()
Aggregate data by columns:
df_test['SumState'] = df_test['BillionGallon'].groupby(df_test['State']).transform('sum')
df_test2 = pd.pivot_table(df_test, index=['State', 'Year','Category'], values='BillionGallon', aggfunc=np.sum).reset_index()
df_import_month = df_import.groupby(['ACCOUNT','category','stat_year' ,'stat_month'])['qty_1000lbs'].agg('sum').reset_index()
Create a new column using row number/index
E_df['speech'] = E_df.reset_index().index
change values based on information of other columns:
df_price_final['convert']=pd.np.where(df_price_final.UNIT_DESC.str.contains("TON"),2000,
pd.np.where(df_price_final.UNIT_DESC.str.contains("CWT"), 100,
pd.np.where(df_price_final.UNIT_DESC.str.contains("BOX"), 80, 1)))
df_2012Prod_weight_clean_unique_naics_merge_fix.loc[df_2012Prod_weight_clean_unique_naics_merge_fix.FIX_COMM==’blueberries wild’, ‘FIX_COMM’]=’blueberries’
Concordance_Census_NAICS8_9Category.loc[Concordance_Census_NAICS8_9Category.NAICS8_DESC==’Pineapples’,[“COMMODITY_DESC”,”CLASS_DESC”]] =”Pineapples”, “ALL CLASSES”