Fortune 100 company in NYC has an immediate need for a Data Engineer at the Associate level. We work with data ranging from demographics, credit and geo data to detailed medical data (medical test results, diagnosis, prescriptions) and social media information. We have a modern computing environment with a solid suite of data science/modeling tools and packages, and a large (but manageable) group of well-trained professionals at various levels to support you.
You will be part of Data & Platform sub-function team under Center for Data Science and Analytics. The Data & Platform team services internally to Data Scientists who focus on Statistical analysis.
You will be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using some of the best of breed tools as part of our Enterprise Data Hub (Hadoop) using R, Spark and Python.
You will apply your data engineering skills to build pipelines, workflows to gather, cleanse, test and curate datasets from Oracle, MSSQL Server, 3rd party data and create datasets in Enterprise Data Lake (Hadoop), which will be used by several teams of predictive modelers.
You will perform Proof of Concepts and test out new software tools under the umbrella of Data Science but geared more towards data engineering.
- Ingests, merges, prepares, tests, documents curated datasets from various novel external and internal datasets for a variety of advanced analytics involving multi-variate models
- Utilizes data wrangling/data matching/ETL techniques while to explore a variety of data sources, gain data expertise, perform summary analyses and curate datasets
- Functions as data expert, contributes to analytics/solutions design and productizing decisions
- Collaborate with Business leaders to understand business challenges and devise solutions by using business acumen and mining vast amounts of data to draw insights
- Can work independently with some supervision and be part of a collaborative team
- Work with Project Managers and Scrum Masters to provide milestones and stories
- Proactively and effectively communicates in various verbal and written formats with senior level member of the team and partner
- Actively participates in proof of concept tests of new data, software and technologies. Shares knowledge within the team
- Follows industry trends and related data/analytics processes and businesses. Attends conferences, events, and vendor meetings as needed
- Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Database Engineering, Programming
- 3-5 years of overall experience working in the field of data wrangling and programming with a minimum of 1 year experience with ingesting, cleaning, merging and applying necessary data wrangling logic in Hadoop
- 1+ years in writing complex SQL queries in any of the following and/or similar databases – Oracle, SQL Server, DB2, MySQL
- Proficiency using Python for all data related work such as Numpy, Pandas, PySpark
- Experience working with Linux Operating System
- Experience working with data visualization tools or packages
- Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using R, Python or a Data Visualization tool such as Tableau and Spotfire
- Understanding of statistical modeling concepts, designs and analytics-based products
- Any experience in using ETL tools such as Ab Initio, Talend, Informatica, Pentaho
- Any experience working with Data Warehouses and/or Data Marts
- Any experience in Life Insurance business