Job Description
About this role
Improve BlackRock’s ability to enhance our retail sales distribution capabilities and services suite by creating, expanding and optimizing our data and data pipeline architecture.
You will create and operationalize data pipelines to enable squads to deliver high quality data-driven product.
You will be accountable for managing high-quality datasets exposed for internal and external consumption by downstream users and applications.
Top technical / programming skills – Python, Java and Scala with ability to work across big data frameworks such as Spark, Hadoop Suite, PySpark, Hive, Cloud Data Platforms Preferably Snowflake and SQL. Experience working with flat files (e.g., csv, tsv, Excel), Database API sources is a must to both ingest and create transformations.
Given the highly execution-focused nature of the work, the ideal candidate will roll up their sleeves to ensure that their projects meet deadlines and will always look for ways to optimize processes in future cycles.
The successful candidate will be highly motivated to create, optimize, or redesign data pipelines to support our next generation of products and data initiatives.
You will be a builder and an owner of your work product.
Responsibilities:
Lead in the creation and maintenance of optimized data pipeline architectures on large and complex data sets.
Assemble large, complex data sets that meet business requirements.
Act as lead to identify, design, and implement internal process improvements and relay to relevant technology organization.
Work with stakeholders to assist in data-related technical issues and support their data infrastructure needs.
Automate manual ingest processes and optimize data delivery subject to service level agreements; work with infrastructure on re-design for greater scalability.
Keep data separated and segregated according to relevant data policies.
Demonstrated ability to join a complex global team, collaborate crossfunctionally (data scientists, platform engineers, business stakeholders), and take ownership of major components of the data platform ecosystem and develop data ready tools to support their job.
Be up-to-date with the latest tech trends in the big-data space and recommend them as needed.
Identify, investigate, and resolve data discrepancies by finding the root cause of issues; work with partners across various cross-functional teams to prevent future occurrences.
Qualifications:
Overall 4 years of hands-on experience in computer/software engineering with majority in big data engineering.
4 years of strong Python or Scala programming skills (Core Python and PySpark) including hands-on experience creating and supporting UDFs and modules like pytest.
4 years of experience with building and optimizing ‘big data’ pipelines, architectures, and data sets. Familiarity with data pipeline and workflow management tools (e.g., Airflow, DBT, Kafka).
4 years of hands-on experience on developing on Spark in a production environment. Expertise on parallel execution, deciding resources and different modes of executing jobs is required.
4 years of experience using Hive (on Spark), Yarn (logs, DAG flow diagrams), Sqoop. Proficiency bucketing, partitioning, tuning and handling different file formats (ORC, PARQUET & AVRO).
4 years of experience using Transact SQL (e.g., MS SQ Server, MySQL), No-SQL and GraphQL.
Strong experience implementing solutions on Snowflake
Experience with data quality and validation frameworks, especially Great Expectations for automated testing.
Strong understanding and use of Swagger/OpenAPI for designing, documenting, and testing RESTful APIs.
Experience in deployment, maintenance, and administration tasks related to Cloud (AWS, Azure Preferred), OpenStack, Docker, Kafka and Kubernetes. Familiarity with CI/CD pipelines for data pipeline automation and deployment (Jenkins, GitLab CI, Azure DevOps)
Experience with data governance, metadata management, and data lineage
usin ... (truncated, view full listing at source)