Data analytics with Python
19.04.2021
Data analysis try to understand data by placing it in a visual context, use programming tools to analyze complex data in distinct scenarios in the real world.
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000008-bb01cbb01e/python-programming-language.png?ph=5105a242fd)
In this article, we will use python libraries to create plots and analysis using:
Matplotlib ( https://matplotlib.org/ )
Pandas (https://pandas.pydata.org/docs/ )
Plotly (https://plotly.com/python/)
Using pandas read_csv
- import pandas as pd
- iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
- print(iris.head())
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000010-8f4528f454/9a175619-6956-4b66-bf3c-6c66967bf405.jpg?ph=5105a242fd)
Scatter Plot in matplotlib
- import matplotlib.pyplot as plt
- fig, ax = plt.subplots()
- # scatter the sepal_length against the sepal_width
- ax.scatter(iris['sepal_length'], iris['sepal_width'])
- # set a title and labels
- ax.set_title('Iris Dataset')
- ax.set_xlabel('sepal_length')
- ax.set_ylabel('sepal_width')
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000011-8b5438b546/e7ca64d4-d228-4496-b980-4815af4fad42.jpg?ph=5105a242fd)
- # create color dictionary
- colors = {'Iris-setosa':'r', 'Iris-versicolor':'g', 'Iris-virginica':'b'}
- # create a figure and axis
- fig, ax = plt.subplots()
- # plot each data-point
- for i in range(len(iris['sepal_length'])):
- ax.scatter(iris['sepal_length'][i], iris['sepal_width'][i],color=colors[iris['class'][i]])
- # set a title and labels
- ax.set_title('Iris Dataset')
- ax.set_xlabel('sepal_length')
- ax.set_ylabel('sepal_width')
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000012-5d6d45d6d6/WhatsApp%20Image%202021-04-19%20at%2000.15.04.jpeg?ph=5105a242fd)
- # get columns to plot
- columns = iris.columns.drop(['class'])
- # create x data
- x_data = range(0, iris.shape[0])
- # create figure and axis
- fig, ax = plt.subplots()
- # plot each column
- for column in columns:
- ax.plot(x_data, iris[column], label=column)
- # set title and legend
- ax.set_title('Iris Dataset')
- ax.legend()
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000013-bb156bb159/WhatsApp%20Image%202021-04-19%20at%2000.35.25.jpeg?ph=5105a242fd)
- #Multiple Histograms
- iris.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000014-d07edd07f0/WhatsApp%20Image%202021-04-19%20at%2000.39.28.jpeg?ph=5105a242fd)
- import numpy as np
- # get correlation matrix
- corr = iris.corr()
- fig, ax = plt.subplots()
- # create heatmap
- im = ax.imshow(corr.values)
- # set labels
- ax.set_xticks(np.arange(len(corr.columns)))
- ax.set_yticks(np.arange(len(corr.columns)))
- ax.set_xticklabels(corr.columns)
- ax.set_yticklabels(corr.columns)
- # Rotate the tick labels and set their alignment.
- plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
- rotation_mode="anchor")
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000015-ab730ab733/WhatsApp%20Image%202021-04-19%20at%2000.49.48.jpeg?ph=5105a242fd)
- # get correlation matrix
- corr = iris.corr()
- fig, ax = plt.subplots()
- # create heatmap
- im = ax.imshow(corr.values)
- # set labels
- ax.set_xticks(np.arange(len(corr.columns)))
- ax.set_yticks(np.arange(len(corr.columns)))
- ax.set_xticklabels(corr.columns)
- ax.set_yticklabels(corr.columns)
- # Rotate the tick labels and set their alignment.
- plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
- rotation_mode="anchor")
- # Loop over data dimensions and create text annotations.
- for i in range(len(corr.columns)):
- for j in range(len(corr.columns)):
- text = ax.text(j, i, np.around(corr.iloc[i, j], decimals=2),
- ha="center", va="center", color="black")
![](https://5105a242fd.cbaul-cdnwnd.com/f75d2489ac7b24a028ed40f8b358a1bf/200000016-daa82daa84/WhatsApp%20Image%202021-04-19%20at%2000.54.25.jpeg?ph=5105a242fd)
In this article, we looked at Matplotlib visualization with Python and The code is available in github in:
https://github.com/Tomas10000/data_analytics_python/blob/master/data_analytics_in_python.ipynb