I’ve been a Database Administrator for over 20 years. Throughout the 1990’s and 2000’s, database administration had become a somewhat lucrative, in-demand job for many people working in Information Technology. Even today, the role of Database Administrators (DBAs) is critical for daily operational goals and maintaining customer applications. Recently, there has been a major shift in what employers are looking for in job candidates for IT positions. Less companies are hosting their own databases; and the need for big data systems in the cloud have created more opportunities for people with skills in cloud architecture, data pipelines architecture and data science tools.
That being said, I feel like this shift has put a lot of DBAs in a precarious position. Being a dedicated DBA is challenging and very time consuming and requires a very broad set of skills. Being a DBA is a full time job in of itself, and database administration does not easily translate to data science or data engineering, so if you want to work towards a job role as a data engineer or data scientist, you probably have to take that initiative on your own and do off-hour work to acquire those skills. Data science is the ability to create meaningful business actions from sometimes messy, uncoordinated data. Data engineering is the ability to take very large volumes of data and make it readily available to business stakeholders regardless of the type of data, where it is stored, or how it is stored. Most DBAs spend their time making sure that the bare metal (local or NAS) storage or provisioned storage of data is consistent, available, and secured with an “engine” that can easily query or perform transactions on this data. All the mechanisms needed to do this quickly, reliably and efficiently with no data loss is the challenge most DBAs face on a daily basis.
This is the very high-level comparison between the fields. But there are some very powerful nuisances that need to be taken into consideration if you want to change roles. For one, being a DBA doesn’t necessarily mean that you understand how to work with data. Data is messy, and one of the strengths of a data scientist is his or her ability to take data and clean it, transform it, removing duplicates, removing anomalies, etc. You then need to have the ability to sample and partition data, create models and score your model. Many data scientists possess knowledge in mathematics and statistics that allow them to perform deep learning or complex machine learning and data analysis tasks.
One common bridge to go from database administration to data science and data engineering is SQL. SQL is a very powerful language for querying data in a relational databases. SQL is also considered one the most popular languages for data science. There are many functions available in SQL to perform data science functionality in databases. SQL is a powerful language this is by far the most popular way to extract data from a database and deliver it to the business.
Most DBAs have had some exposure to SQL, with another group who have had training in programming procedural structured languages like T-SQL, PL/SQL, PL/pgSQL amount others. Therefore, transitioning to languages such as Python and R typically used in data science is less of a journey than starting with little programming experience at all. Both languages have libraries that utilize SQL and database commands.
Along with learning Python and R, learning many of the popular data science and mathematics libraries such as SciKitLearn and NumPy is also helpful. R is a great language to practice data science techniques as well. Look for the many online resources for learning data science. Visit my articles on the data science conferences and data science resources. Take online classes on LinkedIn Learning, Udemy, Datacamp and Coursera which all have starting tracks for data science. A lot of success in moving into a new role involves self-learning. Particularly if you are in a job position that doesn’t have data science work to build skills.
For data engineering, it’s strongly recommended that you start a cloud account in Google Cloud, AWS and Azure. They offer “pay-as-you-go” options and are subscription based services based on the amount of compute time you accumulate. And with the many of the open data sets available free to the public, you can easily build test data pipelines in the cloud on your free time. You can also build pipelines in the cloud to help with you current DBA role. Most companies are transitioning to the cloud and offer their employees cloud access.
Post-graduate education is another path DBAs can take. There are many post-graduate and certificate programs in data science and big data engineering, with many more coming online. And these programs are flexible enough where you can learn outside your normal work hours.