Python: read sql script and pull data from SQL servers

q_2017_2019 = workingDir +”/2019ProductionUpdate5.sql”
q_2017_2019= open(q_2017_2019, ‘r’)
sql_2019Prod = ” “.join(q_2017_2019.readlines())
q_2017_2019.close()
df_2019Prod =pd.read_sql_query(sql_2019Prod,conn)

Python: Filter Data

idx = df_2017Prod_weight[‘YEAR’]==’2017′
df_2017Prod_weight_1=df_2017Prod_weight[idx]

Python: Find the Most Recent Year’s Data

df_2019Prod_weight[‘Year_max’]=df_2019Prod_weight.groupby([‘COMMODITY_DESC’, ‘CLASS_DESC’,’UNIT_DESC’])[‘YEAR’].transform(max)
idx = df_2019Prod_weight[‘Year_max’]==df_2019Prod_weight[‘YEAR’]
df_2019Prod_weight_1=df_2019Prod_weight[idx]

sql list all table and schema names

SELECT *
FROM INFORMATION_SCHEMA.TABLES
order by TABLE_TYPE, TABLE_SCHEMA

Visualizing High Dimensional Clusters

https://www.kaggle.com/minc33/visualizing-high-dimensional-clusters#Introduction:

Python Frequently Used Commands

There are some Python commands I almost apply in each of my script to adjust the default settings, to set up directories.

pd.set_option(‘display.max_columns’, 1000)
pd.set_option(‘display.max_row’, 1000)
pd.set_option(‘display.width’, None)
pd.set_option(‘display.max_colwidth’, -1)

WorkingDir = r”C:\Users\jing\***\Competition”
DataDir = WorkingDir+”\\data”

Python Plot

Scatter plot:

python scatter plot

import matplotlib.pyplot as plt

x_plot=Y[:,0]
y_plot=Y[:,1]
# y_plot=pd.DataFrame(y_plot)
plt.plot(y_plot, x_plot, 'o', color='black');

2nd example:


import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go
from plotly.offline import *

trace3d = go.Scatter3d(
x = G_array[:,100],
y = G_array[:,101],
z = G_array[:,201],
mode = "markers",
name = "Cluster 0",
marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
text = None)

fig = dict(data = trace3d)
iplot(fig)
plotly.offline.plot(fig, filename='3d_G.html')

Other resources:

3D plotting with matplotlib

Python Pandas Create a Dataframe

I usually create dataframes by reading data from csv, excel and sql. You can find some examples here:

python reads data from SQL directly; python executes sql query

Python Pandas read csv common problems

Python Pandas read excel files (xlsx)

But sometimes, you may want to create a small dataframe in the script. So here are some examples:

df_pop = pd.DataFrame({'Year':['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'],
'Population':['309321666', '311556874', '313830990', '315993715', '318301008', '320635163', '322941311', '324985539', '326687501', '328239523']})

You can skip the quotes to have them as int.

df_pop = pd.DataFrame({'Year':[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
'Population':[309321666, 311556874, 313830990, 315993715, 318301008, 320635163, 322941311, 324985539, 326687501, 328239523]})

You can also convert an array to a dataframe:


Y_df=pd.DataFrame(Y)

Other Pandas related blogs:

Python dataframe manipulation using Pandas