HomeArchive by Category "Data Science"

Python: read sql script and pull data from SQL servers

q_2017_2019 = workingDir +”/2019ProductionUpdate5.sql”
q_2017_2019= open(q_2017_2019, ‘r’)
sql_2019Prod = ” “.join(q_2017_2019.readlines())
q_2017_2019.close()
df_2019Prod =pd.read_sql_query(sql_2019Prod,conn)

Python: Filter Data

idx = df_2017Prod_weight[‘YEAR’]==’2017′
df_2017Prod_weight_1=df_2017Prod_weight[idx]

sql list all table and schema names

SELECT *
FROM INFORMATION_SCHEMA.TABLES
order by TABLE_TYPE, TABLE_SCHEMA

Find optimal number of clusters

SPECTRAL CLUSTERING: A QUICK OVERVIEW

Visualizing High Dimensional Clusters

https://www.kaggle.com/minc33/visualizing-high-dimensional-clusters#Introduction:

Python Frequently Used Commands

There are some Python commands I almost apply in each of my script to adjust the default settings, to set up directories.

pd.set_option(‘display.max_columns’, 1000)
pd.set_option(‘display.max_row’, 1000)
pd.set_option(‘display.width’, None)
pd.set_option(‘display.max_colwidth’, -1)

WorkingDir = r”C:\Users\jing\***\Competition”
DataDir = WorkingDir+”\\data”

Python Plot

Scatter plot:

python scatter plot
import matplotlib.pyplot as plt


x_plot=Y[:,0]

y_plot=Y[:,1]

# y_plot=pd.DataFrame(y_plot)

plt.plot(y_plot, x_plot, 'o', color='black');

2nd example:

import matplotlib.pyplot as plt import plotly import plotly.graph_objs as go from plotly.offline import *


trace3d = go.Scatter3d(

                    x = G_array[:,100],

                    y = G_array[:,101],

                    z = G_array[:,201],

                    mode = "markers",

                    name = "Cluster 0",

                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),

                    text = None)

fig = dict(data = trace3d) iplot(fig) plotly.offline.plot(fig, filename='3d_G.html')

Other resources:

3D plotting with matplotlib

Python Pandas Create a Dataframe

I usually create dataframes by reading data from csv, excel and sql. You can find some examples here:

python reads data from SQL directly; python executes sql query

Python Pandas read csv common problems

Python Pandas read excel files (xlsx)

But sometimes, you may want to create a small dataframe in the script. So here are some examples:

df_pop = pd.DataFrame({'Year':['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'], 'Population':['309321666', '311556874', '313830990', '315993715', '318301008', '320635163', '322941311', '324985539', '326687501', '328239523']})

You can skip the quotes to have them as int.

df_pop = pd.DataFrame({'Year':[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019], 'Population':[309321666, 311556874, 313830990, 315993715, 318301008, 320635163, 322941311, 324985539, 326687501, 328239523]})

You can also convert an array to a dataframe:

Y_df=pd.DataFrame(Y)

Other Pandas related blogs:

Python dataframe manipulation using Pandas