DataSciCon.Tech 2017 Review

Saturday, December 2nd , 2017

DataSciCon.Tech is a data science conference held in Atlanta, Georgia Wednesday November 29th to Friday, December 1st and includes both workshops and conference lectures. It took place at the Global Learning Center on the campus of Georgia Tech.  This was the first year of this conference, and I attended to get a sense of the data science scene in Atlanta.  Overall, the experience was very enlightening and introduced me to the dynamic and intensive work being conducted in the area of data science.

IMG_1786

Keynote speaker Rob High, CTO of IBM Watson, discussing IBM Watson and Artificial Intelligence (DataSciCon.Tech 2017).

DataSciCon.Tech Workshops

Four workshop tracks were held Wednesday including Introduction to Machine Learning with Python and TensorFlow, Tableau Hands-on Workshop, Data Science for Discover, Innovation and Value Creation and Data Science with R Workshop.  I elected to attend the Machine Learning with Python with TensorFlow track.  TensorFlow is an open source software library for numerical computations using data flow graphs for Machine Learning.

To prepare for the conference, I installed the TensorFlow module downloaded from https://www.tensorflow.org/install.  In addition to TensorFlow, I downloaded Anaconda (https://www.anaconda.com/), a great Python development environment for those practicing data science programming and includes many of the Python data science packages such as Numpy and SciKit-Learn.

Among the predictive and classification modeling techniques discussed in the workshop:

  • Neural Networks
  • Naive Bayes
  • Linear Regression
  • k -nearest neighbor (kNN)  analysis

These modeling techniques are popular for classifying data and predictive analysis.    Few training sessions on Python, SciKit-Learn or Numpy go into these algorithms in detail due to the various math educational levels of the audience members.  For the course, we used Jupyter Notebook, a web-based python development environment which allows you to share and present your code and results using web services.  Jupyter Notebook can also be hosted in Microsoft Azure, as well as, in other cloud platforms such as Anaconda Cloud and AWS.  To host Python Jupyter Notebook in Azure sign into  https://notebooks.azure.com.

TensorFlow

TensorFlow has a series of functions that uses neural networks and machine learning to test, train and score models.  The advantage of TensorFlow is its ability to train models faster than other modules, which is a very big advantage since splitting data for training models is a process intensive operation. It is particularly powerful on the Graphics Processing Unit (GPU) architecture popular for Machine Learning and Deep Learning.

Download Tensorflow from http://tensorflow.org.  The website also includes a Neural Network Tensorflow sandlot at http://playground.tensorflow.org.

2017-11-29_16-11-32

source:  http://playground.tensorflow.org.  tensorflow.org (DataSciCon.Tech)

DataSciCon.Tech Sessions

I’m going to break down the sessions I attended into the main topics that were covered.  So this is a very high level, one hundred foot point-of-view of the topics covered at the conference.  My plan is to create a few more blogs on the topic that will go into my work as an aspiring data scientist/data architect.  All the information in this blog is based on information presented at the DataSciCon.Tech 2017 conference.

Machine Learning and Artificial Intelligence

The conference emphasized Artificial Intelligence and Machine Learning pretty heavily.  Artificial Intelligence was discussed more in theory and direct applications than design and development.  There were a few demonstrations of the popular IBM Watson Artificial Intelligence system; but I want to focus this blog primarily on Machine Learning, as it’s something that interests me and other data architects.  Artificial Intelligence and Machine Learning are both based on computerized learning algorithms.  Machine Learning uses past data to learn, predict events or identify anomalies.

Another key fact presented at the conference is the number of open source projects and companies that have produced software modules, libraries and packages devoted to the use and implementation of Machine Learning in business applications.  I strongly recommend anyone interested in learning more to research the software solutions discussed in this blog and how they can be implemented.

For those who are new to the concept of Machine Learning (like me), essentially it is defined as follows:

Machine Learning is a subset of Artificial Intelligence that focuses on creating models that learn and predict events based on past data without a human computer programmer having to change code to adapt to new events.  An example would be a spam filter learning new exploits and then blocking those exploits.

2017-12-02_8-17-02 Continue reading