David Taieb">
IBM Developer Advocacy

Developer Advocate: 

David Taieb


David Taieb

For the last 4 years, David has been the lead architect for the Watson Core UI & Tooling team based in Littleton, Massachusetts. During that time, he led the design and development of a Unified Tooling Platform to support all the Watson Tools including accuracy analysis, test experiments, corpus ingestion, and training data generation. Before that, he was the lead architect for the Domino Server OSGi team responsible for integrating the eXpeditor J2EE Web Container in Domino and building first class APIs for the developer community. He started with IBM in 1996, working on various globalization technologies and products including Domino Global Workbench (used to develop multilingual Notes/Domino NSF applications) and a multilingual Content Management system for the Websphere Application Server. David enjoys sharing his experience by speaking at conferences. You’ll find him at various events like the Unicode conference, Eclipsecon, and Lotusphere. He’s also passionate about building tools that help improve developer productivity and overall experience.

All Posts

PixieDust: Magic for Your Python Notebook

As any data scientist knows, Python notebooks are a powerful tool for fast and flexible data analysis. But the learning curve is steep, and it’s easy to get blank page syndrome when you’re starting from scratch. Thankfully, it's easy to save and share notebooks. However, even for seasoned data scientists or developers, modifying an existing notebook can be daunting. Got Syntax? Data science notebooks were first popularized in academia, and there are some formalities to work through before you can get to your analysis. For example, in a Python interactive notebook, a mundane task like creating a simple chart or…

Predict Flight Delays with Apache Spark MLLib, FlightStats, and Weather Data

Flight delays are an inconvenience. Wouldn't it be great to predict how likely a flight is to be delayed? You could remove uncertainty and let travelers plan ahead. Usually, the weather is to blame for delays. So I've crafted an analytics solution based on weather data and past flight performance. This solution takes weather info from Weather Company Data for IBM Bluemix and combines it with flight history from flightstats.com to build a predictive model that can forecast delays. To load and combine all this data, we use our Simple Data Pipe open source tool to move it into a…

Getting started with GraphFrames in Apache Spark

Introduction to Spark and graphs GraphX is one of the 4 foundational components of Spark — along with SparkSQL, Spark Streaming and MLlib — that provides general purpose Graph APIs including graph-parallel computation: GraphX APIs are great but present a few limitations. First they only work with Scala, so if you want to use GraphX with Python in a Jupyter Notebook, then you are out of luck. The second limitation is that they only work at the RDD (Resilient Distributed Dataset) level, which means that they can't benefit from the performance improvement provided by DataFrames and the Catalyst query optimizer.…

What I learned at PyCon and Spark Summit

I just got back from a 2-week road trip and am finally catching my breath to write about all the cool things I learned. My first stop was at the all-important PyCon conference in Portland, Oregon where I ran a 3-hour workshop session: Developing Analytic Applications using Apache Spark™ and Python. After a quick stop back home, I turned around and headed to San Francisco for the much-anticipated Spark Summit, where I showed off Marvin and his now world-famous Rock-Paper-Scissors skills. PyCon 2016, Portland, Oregon It was my first time at PyCon, which brings together thousands of Python developers every…

Analyze Market Trends in Twitter Using Apache Spark, Python, and dashDB

Getting insight into market trends is easier than ever before. Here, I'll show you how to use a few cloud-based data services to understand the worldwide automotive market, its brands, and its customers. This tutorial walks you through: setup of Apache Spark, dashDB data warehouse, and IBM Insights for Twitter importing data from Twitter creation of a Python notebook data shaping and prep use of the Natural Language toolkit for text processing sophisticated analyses and visualizations via Python notebook Save Time! If you don't feel like walking through installation. but want to understand the analytics, skip ahead to the Analyze…

Launch a Spark job using spark-submit

In my previous Spark tutorials, I used notebooks to run and interact with the code. In this tutorial, I show how to run Spark batch jobs programmatically using the spark_submit script functionality on IBM Analytics for Apache Spark. We'll look at 2 examples that launch a Hello World Spark job via spark-submit: one written in Scala and one in Python. Architecture The following diagram shows the architecture for both modes: So, why do we need 2 modes to run a Spark Job? And which one should you use? Notebooks are great if you need to interact with the Spark cluster.…

Real-time Sentiment Analysis of Twitter Hashtags with Spark

Since this tutorial was published, IBM's Message Connect service has been discontinued. This tutorial contains a workaround that lets you bypass that service, so you can still complete the steps here. But for the absolute latest, try the new version of this tutorial. In my Sentiment Analysis of Twitter Hashtags tutorial, we explored how to build a Spark Streaming app that uses Watson Tone Analyzer to perform sentiment analysis on a set of Tweets. In that tutorial, Spark Streaming collects the Twitter data for a finite period. But it doesn't run streaming analytics in real-time. It just accumulates the data…

Apache Spark Courses Now Online

Finish up your coursework before the holiday break! Spark 101 is now in session. Take our 2 quick online courses to start developing with Spark. Not only are they fun and interesting, they're FREE. Begin with our first course: Getting Started with Spark and Notebooks. Then follow up with Analyzing Sentiment in Twitter Hashtags. If you're new to developerWorks courses, you just need to register. Then it should take you only a couple of hours to run through the material. We'd love to know what you think. To give feedback or ask questions, visit each course's Discussion tab. Enjoy!

Humans vs. Apache Spark: Building our Rock-Paper-Scissors game

IBM has a tradition of producing technology that defeats the smartest humans. For example, IBM Deep Blue bested world chess champion Gary Kasparov IBM Watson defeated Jeopardy TV game show champion Ken Jennings The Cloud Data Services Developer Advocacy team takes up the banner with a solution challenging human dominance in child's play, namely Rock-Paper-Scissors. For IBM Insight 2015 in Las Vegas, we built a Rock-Paper-Scissors game powered by the IBM Analytics for Apache Spark service available in IBM Bluemix. People play the classic childhood game against Spark, and the first to win 3 rounds wins the game. Try it…

Start Developing with Spark and Notebooks

New to developing applications with Apache® Spark™? This is the tutorial for you. It provides the end-to-end steps needed to build a simple custom library for Apache® Spark™ (written in scala ) and shows how to deploy it in a notebook, giving you the foundation you need to build real-life production applications. In this tutorial, you'll learn how to: Create a new Scala project using sbt and package it as a deployable jar. Deploy the jar into a Jupyter Notebook. Call the helper functions from a Notebook cell. Optional: Import, test and debug your project into Scala IDE for Eclipse.…

Sentiment Analysis of Twitter Hashtags

Since this tutorial was published, we've made some strides in notebook technology. To save yourself some work and learn more, try an updated version of my Real-time Sentiment Analysis of Twitter Hashtags tutorial. How's your relationship with your customers? You can track how consumers feel about you, your products, or your company based on their tweets. Gauge positive or negative emotions measured across multiple tone dimensions, like anger, cheerfulness, openness, and more. To get real-time sentiment analysis, set up Spark Streaming with Twitter and Watson on Bluemix and use its Notebook to analyze public opinion. This tutorial covers how to…

How to analyze your pipe runs with Bunyan

Introduction In this post, I'll discuss how our Simple Data Pipe sample app uses the Bunyan Node.js logging framework to capture detailed logging information about a pipe run. Then I'll show you how to analyze the report using the Bunyan viewer tool. If you've explored our Simple Data Pipe tutorial on Bluemix, you know that metadata about your pipe runs is stored in Cloudant as JSON. Cloudant's support for binary attachments within JSON lets you attach the logs from Bunyan right alongside their associated JSON document, which you can then access for further analysis. A word about Bunyan Bunyan is…

Create a Secure Gateway to Access Data from the Cloud

First steps to securely access on-premises data from the Cloud. Say you want to show your client some reports that highlight your company's sky-high customer success numbers. Unfortunately, that information lives on a server back at your office behind a firewall, and you can only access the data if you're sitting at your desk connected to the local network. It doesn't have to be that way. Free your data for use where you need it. If you have data on-premises that your team wants to access over the web from somewhere outside your office, read my new tutorial: Hybrid Cloud…