Ironhack Logo

The best data science cheat sheets

Whatever your area of development, knowing how to use the most useful functions of the library you're working with is going to make your life a lot easier.

We’ve collated a collection of cheat sheets for you to get to grips with the main libraries used in data science.

They are grouped into the fields for which each library is designed: Basics, Databases, Data Manipulation, Data Visualization, Analysis, Machine Learning, Deep Learning and Natural Language Processing (NLP).

Basics

If you're just starting out in the world of data science, it's important to understand how at least two of the basic libraries work: Python and NumPy. These two libraries are used throughout the entire development process. The third library, Scipy, is a mathematical tool that can handle more complex calculations than NumPy.

Python basics

NumPy basics

SciPy

Database

Data can be stored in sets or, sometimes, in relational or non-relational databases that are imported into the working platform.

SQL

  • Level: Beginner - Intermediate
  • Area: Relational databases
  • Description: relational databases use a structure of separate tables that store data more efficiently and create relations between them using keys. SQL is the best language for querying data stored in these tables, thanks to its versatility.
  • Source: sqltutorial
  • Cheat sheet: https://www.sqltutorial.org/sql-cheat-sheet/

MongoDB

Data Manipulation

Before getting started with data analytics, it's essential to organise the data set's information so that it's easier to perform the necessary analytical operations. This process is known as data manipulation.

Pandas

Data Wrangling

  • Level: Beginner - Intermediate
  • Area: Data manipulation
  • Description: Prior to conducting an analysis, it's important to clean the DataFrame and organise our data, since we sometimes find duplicate, void or invalid records. The process of cleaning the DataFrame so we can use it for our analysis is known as Data Cleaning or Data Wrangling.
  • Source: pandas
  • Cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Data Visualization

Data visualization is the graphic representation of data and is particularly important for conducting analyses or portraying analysis results, which can help us discover trends, outliers and patterns in the data.

Matplotlib

Seaborn

Folium

  • Level: Intermediate
  • Area: Data visualization
  • Description: Within the field of visualization, maps are a very useful form of representation that allows us to depict geospacial positioning and distances. Folium is a library that allows us to generate maps and easily depict data from a data set, rendering a representation such as a mapbox or OpenStreetMap and adding layers of visual data like cluster points or a heatmap.
  • Source: AndrewChallis
  • Cheat sheet: https://andrewchallis.co.uk/wp-content/uploads/2017/12/Folium.pdf

Machine Learning

Machine learning algorithms allow us to make predictions based on available data. These are known either as regression or classification algorithms, depending on the type of data in question. These processes can be supervised or non-supervised, depending on whether the machine learning model is trained using labelled data, or not, which is known as 'ground truth'.

Scikit-Learn

Deep Learning

Within the field of machine learning, there is a more specific field known as deep learning, which uses artificial neural networks to make predictions.

Keras

Tensorflow

  • Level: Advanced
  • Area: Deep learning
  • Description: This is a second-generation deep learning library developed by Google. It allows users to create models using an API with an inferior or superior abstraction layer, outlining mathematical operations or neural networks, depending on the user's preference.
  • Source: Altoros
  • Cheat sheet: https://cdn-images-1.medium.com/max/2000/1*dtOZSuYDonyyBvEULpJALw.png

PyTorch

  • Level: Advanced
  • Area: Deep learning
  • Description: PyTorch is a deep learning library developed by Facebook. It is one of the newest libraries on the market and offers an interface for working with tensors at a more affordable price than TensorFlow or Keras, for example.
  • Source: PyTorch
  • Cheat sheet: https://pytorch.org/tutorials/beginner/ptcheat.html

Natural Language Processing (NLP)

Within the field of data science, language analysis is an area that's increasingly gaining ground, with algorithms that have been developed to help us analyse text.

NLTK

spaCy

 These cheat sheets contain each library's most useful functions and working methods to help you in your day-to-day development tasks. Happy Coding!

 

Join IronhackReady to join?

Follow the steps of more than a thousand career changers and entrepreneurs that launched their careers in the tech industry with Ironhack's bootcamps.

Browse our coursesBrowse our campus

Related blog posts about Data analytics

View more articles on our blog
Stay up to date on our latest news and events. Sign up now!
Please, type your name
The email is not valid. Please, try again