- San Francisco
- Date Posted
- Aug. 10, 2021
- Data Science
Data Engineers at Discord are responsible for supporting the data architecture that moves and translates data used to inform our most critical strategic and real-time decisions. In addition to extracting and transforming data, you will be expected to use your expertise to build extensible data models and provide meaningful recommendations regarding best practices and performance enhancements to our partners in analytics, machine learning, and product engineering. The ideal candidate will have demonstrated success working with ambiguity and creating impact in a fast-paced environment.
Our work is foundational to company and product strategy — to learn more about Discord Engineering, read our engineering blog here!
What you’ll be doing
- Work with a team of high-performing data science and analytics professionals and cross-functional teams to identify business opportunities and build scalable data solutions.
- Ensure best practices and standards in our data ecosystem are shared across teams.
- Develop subject-matter expertise in relevant business domains.
- Intelligently design data models for optimal storage and retrieval.
- Build and maintain efficient & reliable data pipelines to move and transform data.
- Understand and influence product telemetry practices to support product, analytics, and machine learning needs.
Who you are
- 4+ years of relevant industry or relevant academia experience working with large amounts of data.
- Experience with engineering disciplines, systems design, Python, ETL, and Data Modeling.
- Deep SQL knowledge, including performance optimization, window functions, joins, pivots, and UDFs.
- Experience with manipulating massive-scale structured and unstructured data.
- Experience auditing and refactoring existing ETL to improve efficiency while maintaining great ease-of-use.
- Experience setting up automated systems to monitor data quality and using the information to improve the robustness of pipelines.
- Experience ingesting data from external and internal disparate sources and creating cohesive easy-to-use data models for downstream use.
- You thrive in ambiguous environments and get excited about figuring out solutions to complex problems, and then executing on them.
- You are a first principles thinker that can work with others to come up with pragmatic solutions -- and then evolve and generalize them
- Experience in developing data pipelines using Spark, Dataflow, Airflow, BigQuery, and Google Cloud Platform.
- Understand the Data Lifecycle and concepts such as lineage, governance, privacy, retention, anonymity, etc.
- Excellent communication, organizational, and analytical skills.