Python crash course for Data Engineers (with projects)
-
Go from zero to data hero in 10 days with Python Crash Course! Mastering data requires the right tools, and Python is your superpower. This no-nonsense crash course unlocks your analytical potential, fast. Build pipelines, wrangle data, and crack complex algorithms like a pro. Gumroad exclusive bonus: Access to my secret data engineering cheat sheet!
[NEW] Sample Project 1 - Data Modeling and Optimization in Redshift
- A large e-commerce company wants to analyze their sales data stored in Redshift to gain insights into customer behavior, product performance, and sales trends. The data includes several fact and dimension tables, such as orders, order_items, customers, products, and date dimensions. The company is experiencing performance issues when running complex analytical queries, particularly those involving joins across multiple tables. Your task is to:
- Evaluate the existing data model and identify potential bottlenecks or areas for improvement.
-
Current Fact Tables:-
orders
,order_items
-
Current Dimension Tables:
customers
,products
,date
- Propose a optimized data model design, considering factors such as distribution keys, sort keys, and data partitioning strategies.
- Discuss how you would implement your proposed data model changes and the expected performance improvements.
- Provide examples of optimized queries that could be used for common analytical use cases.
[NEW] Sample Project 2 - Real-Time Social Media Analytics ETL Pipeline Project
- Build a robust ETL pipeline that processes streaming social media analytics data from multiple platforms. You'll create a system that ingests data from simulated social media APIs, transforms engagement metrics, and loads them into a data warehouse for analysis.
Learning Objectives
- Design and implement streaming data pipelines
- Handle real-time data processing with Apache Kafka
- Implement data quality checks and error handling
- Create scalable transformation logic
- Build monitoring and alerting systems
- Practice CI/CD for data pipelines
Prerequisites
- Python programming knowledge
- Basic understanding of APIs
- Docker installed locally
- Git for version control
Technical Stack
- Apache Kafka for streaming
- Python 3.9+
- PostgreSQL for final data storage
- Docker for containerization
- Required Python packages:
- kafka-python
- pandas
- psycopg2
- pydantic
- fastapi (for API endpoints)
- pytest (for testing)
Below projects are available only for second option
Sample Project 3 - Data Visualization and Reporting in AWS
A financial institution needs to create interactive visualizations and reports for their management team to monitor key performance indicators (KPIs) related to loan applications, approvals, and portfolio performance. The data is stored in Redshift, and the institution wants to leverage AWS services for data visualization and reporting.
Your task is to:
- Propose an AWS-based architecture for building visualizations and reports, including the services you would use and how they would interact.
- Discuss the process of extracting data from Redshift, transforming it if necessary, and loading it into the chosen visualization/reporting tool.
- Provide examples of the types of visualizations and reports you would create for the given use case, including how you would handle drill-down capabilities and interactive filtering.
- Explain how you would ensure data security and access control for the visualizations and reports.
Sample Project 4 - Cloud Development and Infrastructure Optimization
- A media streaming company is experiencing rapid growth in their user base and content catalog. Their current data infrastructure, hosted on AWS, is struggling to keep up with the increasing demands. The company needs to optimize their cloud infrastructure to ensure scalability, performance, and cost-effectiveness.
Your task is to:
- Analyze the current AWS infrastructure, including services like Redshift, S3, EC2, and any other relevant components.
- Identify potential bottlenecks or areas for optimization, considering factors such as resource utilization, data transfer costs, and scalability.
- Propose an optimized AWS infrastructure design, leveraging services like Auto Scaling, Elastic Load Balancing, and other AWS services as needed.
- Discuss strategies for monitoring, logging, and troubleshooting the optimized infrastructure.
- Provide a high-level cost estimation and justification for the proposed changes.
120 sales
This fast-paced course cuts through the jargon and equips you with essential Python skills. Automate tasks, analyze data, and build amazing things – all in just 10 days
Pages
40
Redshift Project with code
1
Social Media Ingestion Project with code
1
Add to wishlist
Ratings
4
5
5 stars
100%
4 stars
0%
3 stars
0%
2 stars
0%
1 star
0%