My Favorite Publicly Available Datasets

I’ve been working with data for decades, searching for insights, converting it, managing it, and now performing data analytics. We have access to unbelievable treasure troves of public data to analyze.  Many of the blogs I write are based on these datasets, as I don’t have access to large computing systems.  Here is a list of my favorite publicly available datasets.  Enjoy!

  1. PJM Interconnection Data Dictionary for electrical grids, distribution and transmission.
  2. University of California Irvin (UCI) has a huge machine learning repository to practice techniques.  This repository can be accessed at
  3. Amazon Web Services datasets are available to the public.
  4. Kaggle is a data science competition website that rewards prizes to teams for the best ML models. Datasets are located at
  5. University of Michigan Sentiment Data.
  6. The time series data repositories are located at
  7. Canadian Institute of Cyber Security.
  8. Datasets for “The Elements of Statistical Learning”.
  9. Government Open Data Portal.

One thought on “My Favorite Publicly Available Datasets

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s