Poster Session 2, 11:30 AM - 12:15 PM: Room 163 [C18]

Data Science Career Database

Presenter: Karol Harasim

Faculty Sponsor: Nada Al Sallami

School: Worcester State University

Research Area: Computer Science

ABSTRACT

The aim of this research is to construct a relational database system that provides in-depth and precise analysis of individual job postings in the data science field. Understanding the trends in data science careers can prove to be tedious when there are several job related attributes. A CSV file can assist in analysis but is inefficient for making large scale comparison and analysis. Constructing a relational database based on an imported CSV file provides the structure and organization to make precise analyses and draw conclusions on data science careers. Relational databases are data structures that store normalized data into separate tables that enable an easy way to clean the data to reduce redundant, missing, and incorrect data. Originally starting with one table consisting of all of the data from the CSV file, processing the data using MySQL allowed for the creation of four tables including an associative table between the data jobs and skills tables. Giving each instance of a company, job, and skill a unique identifier decreased redundancy of companies and skills while each instance remained correlated to its respective job. Creating and executing queries on the newly created database quickly answers questions such as determining the most sought after skill in data science job postings. These queries grant data science job candidates the ability to analyze job market patterns. The database demonstrates how CSV file data can be transformed into a structured format to make meaningful observations in the data science job market.


RELATED ABSTRACTS