My Favorite Publicly Available Datasets

I’ve been working with data for decades, searching for insights, converting it, managing it, and now performing data analytics. We have access to unbelievable treasure troves of public data to analyze.  Many of the blogs I write are based on these datasets, as I don’t have access to large computing systems.  Here is a list of my favorite publicly available datasets.  Enjoy!

  1. PJM Interconnection Data Dictionary for electrical grids, distribution and transmission.  https://www.pjm.com/markets-and-operations/data-dictionary.aspx
  2. University of California Irvin (UCI) has a huge machine learning repository to practice techniques.  This repository can be accessed at archive.ics.uci.edu/ml/index.php
  3. Amazon Web Services datasets are available to the public.  https://aws.amazon.com/datasets/.
  4. Kaggle is a data science competition website that rewards prizes to teams for the best ML models. Datasets are located at https://www.kaggle.com/datasets
  5. University of Michigan Sentiment Data.
  6. The time series data repositories are located at  https://fred.stlouisfed.org/categories.
  7. Canadian Institute of Cyber Security. https://www.unb.ca/cic/datasets/nsl.html.
  8. Datasets for “The Elements of Statistical Learning”.  https://web.stanford.edu/~hastie/ElemStatLearn/.
  9. Government Open Data Portal.  https://data.gov

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s