How do you analyze a Petabyte of data?
The Spark Python API or PySpark exposes the Spark programming model to Python. Apache® Spark™ is open-source and is one of the most popular Big Data frameworks for scaling up your tasks in a cluster. It was developed to utilize distributed, in-memory data structures to improve data processing speeds for massive amounts of data.
We’ll also look into Spark SQL – Apache Spark’s module for working with structured data and MLlib – Apache Spark’s scalable machine learning library.
🎓 What will you learn?
- Perform Big Data analysis with PySpark
- Use SQL queries with DataFrames by using the Spark SQL module
- Use Machine learning with MLlib library
👩💻 Who should attend?
- Developers and those interested in Python
- Data and AI enthusiasts
- Register for the live stream or to watch the replay: https://www.crowdcast.io/e/python-pyspark
- Create your free IBM Cloud account at: https://ibm.biz/BdfPQ5
- Anam Mahmood, IBM Developer Advocate, https://www.linkedin.com/in/anam-mahmood-sheikh/
- Hashim Noor, Client Technical Specialist, https://www.linkedin.com/in/hashim-noor/
Ready to put your new skills to good use?
Participate in the 2021 Call for Code Global Challenge, by helping us fight back Climate Change, for a chance to win $200,000 and get support from IBM and our technology partners like The Linux Foundation to get your solution deployed around the world.
This isn’t your average hackathon. If you want the chance to build a solution that can make a true impact in the field, Call for Code can turn your idea into action.
Visit https://callforcode.org for more details and FAQs
Check out the resources & starter kits to kick-off your solution: https://developer.ibm.com/callforcode