Skip to main content

Using Python for Sentiment Analysis in Tableau

This weeks Makeover Monday's data set was the Top 100 Song's Lyrics. After just returning from Tableau's annual conference and being eager to try their new feature, TabPy, this seemed like the perfect opportunity to test it out. In this blog post, I'm going to offer a step-by-step guide on how I did this. If you haven't used Python before, have no fear - this is definitely achievable for novices - read on! 

For some context before I begin, I have limited experience with Python. I recently completed a challenging but great course through edX that I'd highly recommend if you are looking for foundational knowledge - Introduction to Computer Science and Programming Using Python. The syllabus included advanced Python including Classes and thinking about algorithmic complexity. However, to run the analysis I did, it would be helpful to look up and understand at a high level:


  • basic for loops
  • lists
  • dictionaries
  • importing libraries

The libraries I used for this, should you want to look up additional documentation, are:

  • pandas
  • nltk
  • time (this one isn't really necessary - I just used it to test computation time differences between TabPy and local processing.)

I have a Mac so if you're trying to reproduce with a PC, you'll find install instructions here as well.


Part 1 - Setting Up Your Environment

  1. Make sure you are using Tableau v10.1
  2. Open TDE with Top 100 Songs data
  3. Install TabPy

Read through the install directions. Here's my simplified version for those not comfortable with GitHub or command line:

  • Click the green "Clone or Download" button.
  • Select Download
  • Unzip the file and save locally (I moved mine to my desktop)
  • Open your Terminal and navigate to your TabPy folder. Run these commands:

If you see this after your install finished, you're all set:


Now if you're like me and the first time you attempt this you're not successful it may be because you have Python 3.0 and not the required Python 2.7. Or you have both versions but your primary is 3.0 - this is what happened to me as I had Anaconda previously installed (it's part of the TabPy download) and had been using Python 3.0 for the class I took. 


You can manually create a Python 2.7 environment (courtesy of Bora Beran). In your terminal, run:

conda create --name Tableau-Python-Server python=2.7 anaconda
then activate it
and do the pip install from local folders
pip install -r ./tabpy-server/requirements.txt
pip install ./tabpy-client
pip install ./tabpy-server
Part 2 - Connecting to TabPy in Tableau

Now it's time to setup your TabPy in Tableau. In Tableau 10.1 go to:

Help -> Settings and Performance -> Manage External Connection

and enter localhost since you're running TabPy on your own computer. The default port is 9004 so unless you manually changed you should leave it at that


Part 3 - Creating your TabPy Calculation
The TabPy Github page has extensive documentation you should review on using python in Tableau calculations. I simply repurposed one of the calcs they demoed during the TabPy session at #data16 - catch the replay here.

Using the Top 100 songs data set, create the following calculated field:

Everything following # is a comment just to help make sense of what the code is doing. Feel free to remove that text.

Now you can use this calculated field in views with [Word] to process the sentiment score! The downside is that since this is a table calculation and also uses ATTR, you cannot use this within a level of detail calculation (LOD). So unfortunately, you cannot sum of the sentiment on the level of detail of song using this example and data structure. With some data manipulation it is possible but I won't be diving into that.

TabPy vs. Pre-Processing Data for Tableau

Unfortunately, you cannot publish vizzes using TabPy to Tableau Public. If you want to download the .twbx version I made using TabPy, you can do so here.

However, you could run this analysis outside of Tableau and simply import the output and create your viz that way. I did this which also gave me more flexibility with LODs since I no longer was using TabPy.

TabPy definitely took me less time and required less code. However, it did take 
~2.5 minutes*** to process 8,668 words whereas when I ran my code (below) outside of Tableau it took under 1 second to get the scores and write them back to a CSV.

***11/17 Update: Bora Beran made a great point; be mindful of how you're addressing your TabPy Table Calc - "If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word." 

At the time of posting this blog, I was addressing all dimensions in view and on a few occasions when working with this data I experienced a very slow result return time as stated. However, today when running this calc it took the same time in Tableau as I stated outside of Tableau. I don't have a clear idea as to why but I was running that query on my local machine and think it might have simply been to limited resources to process the analysis at the time. 

Below is what the code would like like outside of TabPy. You can run this code in a Jupyter notebook or another IDE - I used Spyder only because I used that for my class.

You can download my Tableau Public viz which uses the output of the below code to inspect further!


Here's the final viz - half of it is cut off so be sure to view it in Tableau Public:

Comments

  1. Hi Brit,
    I think for this type of analysis, as you also said, it is a good idea to preprocess since it looks like data is not dynamic. But I was wondering.. What were you using for your addressing table calc setting on the Python calculated field?

    If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word.

    In the GIF it looked like we're sending a large number of requests. Do you mind trying with everything on addressing? This should log only one entry in your console and I would expect it to be noticeably faster.

    For the TC demo if I recall correctly we were running sentiment analysis on the fly on 18K tweets and it was less than 1.5 seconds.

    Thanks,

    Bora

    ReplyDelete
    Replies
    1. Hi Bora,

      Thanks for the comment - great point! I just double checked and when I clocked the 2.5 minutes it was addressing all the dimensions. However, the extremely odd thing is that it's now working within Tableau at the same speed it did outside of Tableau. This is different from the behavior I observed earlier this week and I'm not sure I understand why...perhaps it was just chance that I had other applications using a lot of my machines resources that it was slow to process that query? I'm not sure - I'll update this post though to not deter others!

      Brit

      Delete
    2. Hi Bora,

      Could you please elaborate a bit more on this? What does that mean "add everything on addressing"?

      Does that mean to add all the dimensions that are relevant for scoring to the second part of SCRIPT_REAL. i.e SCRIPT_REAL("",ADDRESSING?)

      Delete
    3. SCRIPT_ calculations are table calculations. If you click on the pill, you should see an option to edit table calculation. In the Table Calculation dialog you will see a list of all the dimensions in your current sheet. If you check all the boxes next to the names of dimensions you will be adding everything to addressing. TabPy GitHub page has an example of this (the second Tableau screenshot on the page).

      https://github.com/tableau/TabPy/blob/master/TableauConfiguration.md

      In this example, you will see that CustomerID is the only item checked hence being used as addressing. Category and Segment are not checked which means they are being used for partitioning. Because of this Tableau will make a separate request to Python for every Category-Segment combination such that you get the correlation coefficient for each pane e.g. Technology-Consumer, Technology-Corporate, Office Supplies-Corporate and so on.

      I hope this helps.

      Bora

      Delete
  2. Hi Brit,

    Very interesting blog. I am new to python. Could you please explain the below lines of code

    1) word_score_dict[words[i]] = scores[i]

    2) Why are you using list and .iteritems while creating the below dataframe. Can't we just pass the word_score_dict as is
    df = pd.DataFrame(list(word_score_dict.iteritems()), columns=['word','score'])

    Floyd

    ReplyDelete
    Replies


    1. Hi Floyd - thanks for the questions! This was my first time using pandas so I did have to do some Googling to figure out how to create the data frame and am welcoming any feedback to improve! With that said, here’s my responses:

      1. At this point in the code I have a two lists - one that contains my words and one that contains the scores. Since Python lists are ordered, I know that the first word in my Word list’s score can be found by accessing the first score in my Score list and on and on. So that line of code is essentially iterating through those two lists and creating a Python dictionary of key:value pairs. I’m going to put a link at the bottom of this comment where you can see this visually!

But - to be honest what I did wasn’t that elegant. It works but a better, more concise way would be to instead make a dictionary from the get go vs. two list that I then create the dictionary with. That code would instead be:

text = top_100['Word']
sid = SentimentIntensityAnalyzer()
word_score_dict = {}

for word in text:
 ss = sid.polarity_scores(word)
 word_score_dict[word]=(ss['compound'])

      2. the issue I had with passing word_score_dict was it caused a ValueError: If using all scalar values, you must pass an index. When I did some searching I came across this:

      http://stackoverflow.com/questions/17839973/construct-pandas-dataframe-from-values-in-variables


      http://pythontutor.com/visualize.html#code=words%20%3D%20%5B%22happy%22,%20%22sad%22%5D%0Ascores%20%3D%20%5B0.57,-0.48%5D%0Aword_score_dict%20%3D%20%7B%7D%0A%0Afor%20i%20in%20range(len(words%29%29%3A%0A%20%20%20%20word_score_dict%5Bwords%5Bi%5D%5D%20%3D%20scores%5Bi%5D%0A%20%20%20%20%0Aprint(word_score_dict%29&cumulative=false&heapPrimitives=false&mode=edit&origin=opt-frontend.js&py=2&rawInputLstJSON=%5B%5D&textReferences=false

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Thanks for you post on tableau and python.Expecting some more articles from you blog.
    Tableau Training in Hyderabad

    ReplyDelete
  5. Hi Brit, which one is your calculated field? I couldn't find it in your workbook. Where did you store the following?

    #SCRIPT_REAL is a function in Tableau which returns a result from an external service script. It's in this function we pass the python code.

    SCRIPT_REAL("from nltk.sentiment import SentimentIntensityAnalyzer

    text = _arg1 #you have to use _arg1 to reference the data column you're analyzing, in this case [Word]. It gets word further down after the ,
    scores = [] #this is a python list where the scores will get stored
    sid = SentimentIntensityAnalyzer() #this is a class from the nltk (Natural Language Toolkit) library. We'll pass our words through this to return the score

    for word in text: # this loops through each row in the column you pass via _arg1; in this case [Word]
    ss = sid.polarity_scores(word) #passes the word through the sentiment analyzer to get the score
    scores.append(ss['compound']) #appends the score to the list of scores

    return scores #returns the scores
    "
    ,ATTR([Word]))

    ReplyDelete
  6. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly Contact MaxMunus
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 1,00,000 + trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh srivastava
    MaxMunus
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+918553576305
    www.MaxMunus.com


    ReplyDelete
  7. Thanks for sharing the information about the Tableauand keep updating us.This information is really useful

    ReplyDelete
  8. The blog gave me idea to use python for sentiment analysis My sincere thanks for sharing this post Thanking you
    Python Training in Chennai

    ReplyDelete
  9. Thank you so much for sharing this worth able content with us. The concept taken here will be useful for my future programs and i will surely implement them in my study. Keep blogging article like this.

    Python Training In Bangalore

    ReplyDelete
  10. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.

    Software Testing Training in chennai

    ReplyDelete
  11. Crisp.. I have decided to follow your blog so that I can myself updated.


    Software Testing Training in chennai

    ReplyDelete
  12. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me...
    Android training in chennai
    Ios training in chennai

    ReplyDelete

  13. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
    Web Design Development Company
    Mobile App Development Company

    ReplyDelete
  14. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Python Training in Chennai

    ReplyDelete
  15. Please click to play,if you wanna join casino online. Thank you
    gclub
    โกลเด้นสล็อต

    ReplyDelete
  16. Thanks for sharing this useful information. I read your blog completely.It is crispy to study. I gather lot of information about python with the help of this blog.
    Thanks for sharing..want more informaion about python.


    Python Online Training

    ReplyDelete
  17. Hi all dear!
    I like your pages and i would like to share this post with your collection.
    Thank you!!!

    จีคลับ
    goldenslot mobile

    ReplyDelete
  18. I think this tempale can use for free travel information blog review for blogger when write review.
    Hi admin
    Nice template
    Tôi có một website chuyên cung cấp các loại thiết bị trợ giảng cho giáo viên trong đó tiêu biểu là máy trợ giảng với chất lượng âm thanh hoàn hảo
    Ngoài ra bạn nên tham khảo thêm xe đẩy hàng của chúng tôi khi vận chuyển hàng hóa cùng với máy sấy tay giúp làm khô tay nhanh chóng

    ReplyDelete
  19. hello everyone.....
    thank the good topic.
    Welcome To Casino online Please Click the website
    thank you.
    ทางเข้าจีคลับ
    gclub casino
    goldenslot slots casino

    ReplyDelete
  20. Hi Brit,
    I think for this type of analysis, as you also said, it is a good idea to preprocess since it looks like data is not dynamic. But I was wondering..Thanks for sharing..,

    Python Online Training

    ReplyDelete
  21. Hi admin..,
    Very nice blog.I understand the concept you put it in the blog. you are put it very crizpy information. Thanks for sharing..


    Python Online Training

    ReplyDelete
  22. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..
    Android App Development Company

    ReplyDelete
  23. Really cool post, highly informative and professionally written and I am glad to be a visitor of this perfect blog, thank you for this rare info!


    Tableau Online Training

    ReplyDelete
  24. Very nice blog.I understand the concept you put it in the blog. you are put it very crizpy information. Thanks for sharing..

    goldenslot casino
    บาคาร่าออนไลน์
    gclub casino

    ReplyDelete
  25. Hello!! I'am glad to read the whole content of this blog and am very excited.Thank you.
    บาคาร่า
    gclub จีคลับ
    gclub casino

    ReplyDelete
  26. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...Thank you very much for this one.
    web design Company
    web development Company
    web design Company in chennai
    web development Company in chennai
    web design Company in India
    web development Company in India

    ReplyDelete
  27. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Tableau online training

    ReplyDelete

Post a Comment

Leave a comment!

Popular posts from this blog

Resources for Self-Improvement for Data Industry Professionals

Last Update: May 22, 2017

Over the last year I have noticed that as my social engagement increased I started to receive many messages to the likes of the following:

 "I came across your Tableau profile/blog/Twitter and as a new user I would love to know your journey/resources you used to learn the tool."
"How did you enter the data field with a background in political science?"
"I came across your profile and was very impressed with your achievements and career path as you have grown into the Business Intelligence field. What advice would you have to a new comer?"
"As someone who came from a non-technical background and quickly grown into the BI field successfully, I am wondering if you would share your experiences and tips." 

For a while I felt a bit out of place to receive the compliments and struggled to realize I had a point of view that could be valuable to others in their own career progression. With the support of my peers and the amazing women…

Open Data Sets

A connection of mine recently shared a great resource with me for those of you who are aspiring data scientist or just love data. It's an open-source data science program that can be found here: http://datasciencemasters.org/. Check out this great data repository compiled by the project: Open Data List of Public Datasets - user-curatedDBpedia - utilizing a large multi-domain ontologyPublic Data Sets on AWS - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more.Governmental Data Compendium of Governmental Open Data SourcesData.gov (USA)Africa Open DataUS Census - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more.Non-Governmental Org Data The World Bank - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Ban…