Buscar

Data Analysis with Python - FreeCodeCamp

Prévia do material em texto

Data Analysis
with Python
Full tutorial for beginners
1. What is Data Analysis
2. Real example Data Analysis with Python
3. How to use Jupyter Notebooks
4. Intro to NumPy (exercises included)
5. Intro to Pandas (exercises included)
6. Data Cleaning
7. Reading Data SQL, CSVs, APIs, etc
8. Python in Under 10 Minutes
About this tutorial
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/freecodecamp-pandas-real-life-example&sa=D&source=editors&ust=1666536252597231&usg=AOvVaw1m0pXX05VHt85oi2K2OwJR
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/interactive-jupyterlab-tutorial-ac5fa63f/lab&sa=D&source=editors&ust=1666536252597618&usg=AOvVaw2LyosF2-1z1odcQSWVy3wP
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/freecodecamp-intro-to-numpy-6c285b74&sa=D&source=editors&ust=1666536252597836&usg=AOvVaw2ZADKLXtOl7edRGGOOZCvy
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/freecodecamp-intro-to-pandas-902ae59b&sa=D&source=editors&ust=1666536252598015&usg=AOvVaw2vjIUMrerV2SUkQfzjcWM8
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/data-cleaning-rmotr-freecodecamp&sa=D&source=editors&ust=1666536252598233&usg=AOvVaw2HYrE55dNc5P4OAAHb37wG
https://www.google.com/url?q=https://notebooks.ai/rmotr-curriculum/python-under-10-minutes-15addcb2&sa=D&source=editors&ust=1666536252598467&usg=AOvVaw0BxqYbsDtHOyn3gzytl4eZ
What is Data Analysis?
> A process of inspecting, cleansing, transforming and modeling data with 
the goal of discovering useful information, informing conclusion and 
supporting decision-making.
Definition by Wikipedia.
What is Data Analysis
https://www.google.com/url?q=https://en.wikipedia.org/wiki/Data_analysis&sa=D&source=editors&ust=1666536252972308&usg=AOvVaw0MPXID0_Fn0AXuns357pPl
> A process of inspecting, cleansing, transforming and modeling data with 
the goal of discovering useful information, informing conclusion and 
supporting decision-making.
Definition by Wikipedia.
What is Data Analysis
https://www.google.com/url?q=https://en.wikipedia.org/wiki/Data_analysis&sa=D&source=editors&ust=1666536253119409&usg=AOvVaw38UvgnoEc8jezTPtpwNrYG
> A process of inspecting, cleansing, transforming and modeling data with 
the goal of discovering useful information, informing conclusion and 
supporting decision-making.
Definition by Wikipedia.
What is Data Analysis
https://www.google.com/url?q=https://en.wikipedia.org/wiki/Data_analysis&sa=D&source=editors&ust=1666536253133323&usg=AOvVaw291rxiUVpBU8ZK-pzScyoo
> A process of inspecting, cleansing, transforming and modeling data with 
the goal of discovering useful information, informing conclusion and 
supporting decision-making.
Definition by Wikipedia.
What is Data Analysis
https://www.google.com/url?q=https://en.wikipedia.org/wiki/Data_analysis&sa=D&source=editors&ust=1666536253147212&usg=AOvVaw3m2wxZVsaKpvRABCfiYZh0
> A process of inspecting, cleansing, transforming and modeling data with 
the goal of discovering useful information, informing conclusion and 
supporting decision-making.
Definition by Wikipedia.
What is Data Analysis
https://www.google.com/url?q=https://en.wikipedia.org/wiki/Data_analysis&sa=D&source=editors&ust=1666536253161034&usg=AOvVaw0ol6-QkqlKiATV-pgN1X2I
Data Analysis Tools
Auto-managed closed tools Programming Languages
👎 Closed Source 󰢃
👎 Expensive 💸
👎 Limited 😩
👍 Easy to learn 󰠁
👍 Open Source 🤩 
👍 Free (or very cheap) 🤑
👍 Extremely Powerful 💪
👎 Steep learning curve 󰠁
Auto-managed closed tools Programming Languages
Why Python
for Data Analysis?
Why Python for Data Analysis?
Why would we choose Python over R or Julia?
👍 very simple and intuitive to learn
👍 “correct” language
👍 powerful libraries (not just for Data Analysis)
👍 free and open source
👍 amazing community, docs and conferences
Python, sadly, is not always the answer
● When R Studio is needed
● When dealing with advanced statistical methods
● When extreme performance is needed
When to choose R?
The Data Analysis
Process
● Building Machine 
Learning Models
● Feature Engineering
● Moving ML into 
production
● Building ETL 
pipelines
● Live dashboard and 
reporting
● Decision making 
and real-life tests
● Exploration
● Building statistical 
models
● Visualization and 
representations
● Correlation vs 
Causation analysis
● Hypothesis testing
● Statistical analysis
● Reporting
● Hierarchical Data
● Handling categorical 
data
● Reshaping and 
transforming 
structures
● Indexing data for 
quick access
● Merging, combining 
and joining data
● Missing values and 
empty data
● Data imputation
● Incorrect types
● Incorrect or invalid 
values
● Outliers and non 
relevant data
● Statistical sanitization
Data Extraction Data Cleaning Data Wrangling Analysis
● SQL
● Scrapping
● File Formats
○ CSV
○ JSON
○ XML
● Consulting APIs
● Buying Data
● Distributed 
Databases
Action
Data Analysis
Vs
Data Science
DATA ANALYSIS VS DATA SCIENCE
The traditional view
Python & PyData Ecosystem
PYTHON ECOSYSTEM:
The libraries we use...
● pandas: The cornerstone of our Data Analysis job with Python
● matplotlib: The foundational library for visualizations. Other libraries we’ll use will be 
built on top of matplotlib.
● numpy: The numeric library that serves as the foundation of all calculations in Python.
● seaborn: A statistical visualization tool built on top of matplotlib.
● statsmodels: A library with many advanced statistical functions.
● scipy: Advanced scientific computing, including functions for optimization, linear 
algebra, image processing and much more.
● scikit-learn: The most popular machine learning library for Python (not deep learning)
https://www.google.com/url?q=https://pandas.pydata.org/&sa=D&source=editors&ust=1666536255988796&usg=AOvVaw2FE7_zfchepdg9qYnoGPMd
https://www.google.com/url?q=https://matplotlib.org/&sa=D&source=editors&ust=1666536255989016&usg=AOvVaw3YSx_kc5wxkwLM1ntpGfZT
https://www.google.com/url?q=https://numpy.org/&sa=D&source=editors&ust=1666536255989165&usg=AOvVaw3pYaktN_CaCET1i2ClpPg7
https://www.google.com/url?q=https://seaborn.pydata.org/&sa=D&source=editors&ust=1666536255989339&usg=AOvVaw1FafvplaxcemAdoMZ1XaTu
https://www.google.com/url?q=https://www.statsmodels.org/stable/index.html&sa=D&source=editors&ust=1666536255989520&usg=AOvVaw2Yz_nTmjf7UTHasYN5ZH2D
https://www.google.com/url?q=https://scipy.org/&sa=D&source=editors&ust=1666536255989659&usg=AOvVaw3XJCo8swV5bvO9QWzb9dLZ
https://www.google.com/url?q=http://scikit-learn.org/stable/&sa=D&source=editors&ust=1666536255989829&usg=AOvVaw2yf7NVuoTxgz4nrm6aYnKo
How Python Data
Analysts Think
EXCEL, TABLEAU, ETC. 
They’re all visual tools...
Thinking like a
Python Data Analyst
And finally,
why Python?
1. What is Data Analysis
2. Real Example Data Analysis with Python
3. How to use Jupyter Notebooks
4. Intro to NumPy (exercises included)
5. Intro to Pandas (exercises included)
6. Data Cleaning
7. Reading Data SQL, CSVs, APIs, etc
8. Python in Under 10 Minutes
About this tutorial

Continue navegando