The following is from a posting from a Data Science Intern for Lenovo.  A good example of what to expect with a typical Data Science position.

The Data Scientist Intern will be expected to perform data mining, statistical learning, predictive modeling, mathematical and simulation modeling, forecasting, data visualization in support of key strategic projects.This position is dedicated to performing best in class data management and business analytics. Key responsibilities include:

  • Participate in projects, tasks, and activities related to data integration, data cleaning, descriptive analyses, exploratory analyses, predictive modeling, data mining, text analytics, rapid prototyping, and data visualization
  • Collaborate with colleagues throughout the business to collect, store, access, and analyze data from a variety of sources.
  • Assist in developing static and interactive data visualizations
  • Develop predictive models and simulations using a variety of software and tools
  • Leverage data / big data to discover patterns and solve strategic analytic business problems using both structured and unstructured data sets across many environments
  • Develop analytic capabilities that drive better outcomes for both customers and the company, informing business decisions across a broad range of functions.

Position RequirementsThe Data Scientist Intern should have the experience and skills needed to successfully execute the key position objectives. Requirements include:

  • Motivated self-starter with a desire to develop solutions for the data analytics space using cutting edge computing technology
  • Experience with analytic projects and programs
  • Strong organizational and communication skills, the ability to work in a collaborative environment, and a desire to improve skills are essential
  • Ability to extract, merge and analyze data from a wide variety of sources (e.g., relational databases, text and unstructured files, sensor data, image and video files).
  • Ability to quickly and easily learn new open source software
  • Experience with static and/or interactive data visualization methods such as Qlik or Tableau
  • Experience with SQL and NoSQL databases, including any of these: MySQL, PostgreSQL, SQLite, MongoDB, and Neo4j.
  • Experience with techniques and technologies for accessing and analyzing “big data” using Hadoop, Kafka, Spark, Cassandra, Splunk, etc.
  • Programming experience in Java, R, Python, Hive, Pig, etc
  • Experience with machine learning or AI tools such as Mahout, Weka, etc.