Use Python Pandas to Process Data, rename, data type, data size,drop duplicates, lower case, aggregate data, drop columns, drop rows.

get the first row from a dataframe:

df_head = df.iloc[0]

get the shape/size information:
df.shape

get data types of columns:
df_cat9.dtypes

drop the first row:
df = df.drop(0)

drop columns:
X_df = X_df.drop(columns=['Cluster'])

get the unique values of a columns:
df_2012Prod.UNIT_DESC.unique()

Drop duplicates:

df_2012Prod_weight_clean_unique = df_2012Prod_weight_clean.drop_duplicates(['COMMODITY_DESC', 'CLASS_DESC'])

rename column names

Y_df.columns=[‘Cluster’,’SpeechNumber’]

df_2015_1 = df_2015.rename(columns={ “Attribute”:”Definition”, “Column Tag”: “2015” })

reorder columns:
df_2015_1 = df_2015_1[[‘Definition’, ‘2015’]]

remove rows if…:
df_2012Prod_weight = df_2012Prod[~df_2012Prod["UNIT_DESC"].str.contains("\$")]

Convert strings to lower cases:

df_2012Prod_weight_clean_unique['COMMODITY_DESC_low'] = df_2012Prod_weight_clean_unique['COMMODITY_DESC'].str.lower()

Aggregate data by columns:
df_test['SumState'] = df_test['BillionGallon'].groupby(df_test['State']).transform('sum')


df_test2 = pd.pivot_table(df_test, index=['State', 'Year','Category'], values='BillionGallon', aggfunc=np.sum).reset_index()
df_import_month = df_import.groupby(['ACCOUNT','category','stat_year' ,'stat_month'])['qty_1000lbs'].agg('sum').reset_index()

Create a new column using row number/index

E_df['speech'] = E_df.reset_index().index

change values based on information of other columns:
df_price_final['convert']=pd.np.where(df_price_final.UNIT_DESC.str.contains("TON"),2000, pd.np.where(df_price_final.UNIT_DESC.str.contains("CWT"), 100, pd.np.where(df_price_final.UNIT_DESC.str.contains("BOX"), 80, 1)))

df_2012Prod_weight_clean_unique_naics_merge_fix.loc[df_2012Prod_weight_clean_unique_naics_merge_fix.FIX_COMM==’blueberries wild’, ‘FIX_COMM’]=’blueberries’

Concordance_Census_NAICS8_9Category.loc[Concordance_Census_NAICS8_9Category.NAICS8_DESC==’Pineapples’,[“COMMODITY_DESC”,”CLASS_DESC”]] =”Pineapples”, “ALL CLASSES”

Use Python Pandas to Process/View Data

Leave a Reply Cancel reply