I’ve been working with data for decades, searching for insights, converting it, managing it, and now performing data analytics. We have access to unbelievable treasure troves of public data to analyze. Many of the blogs I write are based on these datasets, as I don’t have access to large computing systems. Here is a list of my favorite publicly available datasets. Enjoy!
- PJM Interconnection Data Dictionary for electrical grids, distribution and transmission. https://www.pjm.com/markets-and-operations/data-dictionary.aspx
- University of California Irvin (UCI) has a huge machine learning repository to practice techniques. This repository can be accessed at archive.ics.uci.edu/ml/index.php
- Amazon Web Services datasets are available to the public. https://aws.amazon.com/datasets/.
- Kaggle is a data science competition website that rewards prizes to teams for the best ML models. Datasets are located at https://www.kaggle.com/datasets
- University of Michigan Sentiment Data.
- The time series data repositories are located at https://fred.stlouisfed.org/categories.
- Canadian Institute of Cyber Security. https://www.unb.ca/cic/datasets/nsl.html.
- Datasets for “The Elements of Statistical Learning”. https://web.stanford.edu/~hastie/ElemStatLearn/.
- Government Open Data Portal. https://data.gov