Skip to main content

Using Python for Sentiment Analysis in Tableau

This weeks Makeover Monday's data set was the Top 100 Song's Lyrics. After just returning from Tableau's annual conference and being eager to try their new feature, TabPy, this seemed like the perfect opportunity to test it out. In this blog post, I'm going to offer a step-by-step guide on how I did this. If you haven't used Python before, have no fear - this is definitely achievable for novices - read on! 

For some context before I begin, I have limited experience with Python. I recently completed a challenging but great course through edX that I'd highly recommend if you are looking for foundational knowledge - Introduction to Computer Science and Programming Using Python. The syllabus included advanced Python including Classes and thinking about algorithmic complexity. However, to run the analysis I did, it would be helpful to look up and understand at a high level:


  • basic for loops
  • lists
  • dictionaries
  • importing libraries
The libraries I used for this, should you want to look up additional documentation, are:
  • pandas
  • nltk
  • time (this one isn't really necessary - I just used it to test computation time differences between TabPy and local processing.)
I have a Mac so if you're trying to reproduce with a PC, you'll find install instructions here as well.

Part 1 - Setting Up Your Environment
  1. Make sure you are using Tableau v10.1
  2. Open TDE with Top 100 Songs data
  3. Install TabPy
Read through the install directions. Here's my simplified version for those not comfortable with GitHub or command line:
  • Click the green "Clone or Download" button.
  • Select Download
  • Unzip the file and save locally (I moved mine to my desktop)
  • Open your Terminal and navigate to your TabPy folder. Run these commands:

If you see this after your install finished, you're all set!


Part 2 - Connecting to TabPy in Tableau

Now it's time to setup your TabPy in Tableau. In Tableau 10.1 go to:

Help > Settings and Performance > Manage External Connection

and enter localhost since you're running TabPy on your own computer. The default port is 9004 so unless you manually changed you should leave it at that


Part 3 - Creating your TabPy Calculation
The TabPy Github page has extensive documentation you should review on using python in Tableau calculations. I simply repurposed one of the calcs they demoed during the TabPy session at #data16 - catch the replay here.

Using the Top 100 songs data set, create this calculated field.


Everything following # is a comment just to help make sense of what the code is doing. Feel free to remove that text.

Now you can use this calculated field in views with [Word] to process the sentiment score! The downside is that since this is a table calculation and also uses ATTR, you cannot use this within a level of detail calculation (LOD). So unfortunately, you cannot sum of the sentiment on the level of detail of song using this example and data structure. With some data manipulation it is possible but I won't be diving into that.

TabPy vs. Pre-Processing Data for Tableau

Unfortunately, you cannot publish vizzes using TabPy to Tableau Public. If you want to download the .twbx version I made using TabPy, you can do so here.

However, you could run this analysis outside of Tableau and simply import the output and create your viz that way. I did this which also gave me more flexibility with LODs since I no longer was using TabPy.

TabPy definitely took me less time and required less code. However, it did take 
~2.5 minutes*** to process 8,668 words whereas when I ran my code (below) outside of Tableau it took under 1 second to get the scores and write them back to a CSV.

***11/17 Update: Bora Beran made a great point; be mindful of how you're addressing your TabPy Table Calc - "If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word." 

At the time of posting this blog, I was addressing all dimensions in view and on a few occasions when working with this data I experienced a very slow result return time as stated. However, today when running this calc it took the same time in Tableau as I stated outside of Tableau. I don't have a clear idea as to why but I was running that query on my local machine and think it might have simply been to limited resources to process the analysis at the time. 

This is what the code would like like outside of TabPy. You can run this code in a Jupyter notebook or another IDE - I used Spyder only because I used that for my class.


You can download my Tableau Public viz which uses the output of the below code to inspect further!


Here's the final viz - half of it is cut off so be sure to view it in Tableau Public:

Comments

  1. Hi Brit,
    I think for this type of analysis, as you also said, it is a good idea to preprocess since it looks like data is not dynamic. But I was wondering.. What were you using for your addressing table calc setting on the Python calculated field?

    If you have all your dimensions in addressing we will make a single call to Python and pass all the data at once which will be much faster. Otherwise we make one call per partition. If e.g. song title is on partitioning we would send a separate request for each song. If word is on partitioning we will send a separate request per word.

    In the GIF it looked like we're sending a large number of requests. Do you mind trying with everything on addressing? This should log only one entry in your console and I would expect it to be noticeably faster.

    For the TC demo if I recall correctly we were running sentiment analysis on the fly on 18K tweets and it was less than 1.5 seconds.

    Thanks,

    Bora

    ReplyDelete
    Replies
    1. Hi Bora,

      Thanks for the comment - great point! I just double checked and when I clocked the 2.5 minutes it was addressing all the dimensions. However, the extremely odd thing is that it's now working within Tableau at the same speed it did outside of Tableau. This is different from the behavior I observed earlier this week and I'm not sure I understand why...perhaps it was just chance that I had other applications using a lot of my machines resources that it was slow to process that query? I'm not sure - I'll update this post though to not deter others!

      Brit

      Delete
    2. Hi Bora,

      Could you please elaborate a bit more on this? What does that mean "add everything on addressing"?

      Does that mean to add all the dimensions that are relevant for scoring to the second part of SCRIPT_REAL. i.e SCRIPT_REAL("",ADDRESSING?)

      Delete
    3. SCRIPT_ calculations are table calculations. If you click on the pill, you should see an option to edit table calculation. In the Table Calculation dialog you will see a list of all the dimensions in your current sheet. If you check all the boxes next to the names of dimensions you will be adding everything to addressing. TabPy GitHub page has an example of this (the second Tableau screenshot on the page).

      https://github.com/tableau/TabPy/blob/master/TableauConfiguration.md

      In this example, you will see that CustomerID is the only item checked hence being used as addressing. Category and Segment are not checked which means they are being used for partitioning. Because of this Tableau will make a separate request to Python for every Category-Segment combination such that you get the correlation coefficient for each pane e.g. Technology-Consumer, Technology-Corporate, Office Supplies-Corporate and so on.

      I hope this helps.

      Bora

      Delete
  2. Hi Brit,

    Very interesting blog. I am new to python. Could you please explain the below lines of code

    1) word_score_dict[words[i]] = scores[i]

    2) Why are you using list and .iteritems while creating the below dataframe. Can't we just pass the word_score_dict as is
    df = pd.DataFrame(list(word_score_dict.iteritems()), columns=['word','score'])

    Floyd

    ReplyDelete
    Replies


    1. Hi Floyd - thanks for the questions! This was my first time using pandas so I did have to do some Googling to figure out how to create the data frame and am welcoming any feedback to improve! With that said, here’s my responses:

      1. At this point in the code I have a two lists - one that contains my words and one that contains the scores. Since Python lists are ordered, I know that the first word in my Word list’s score can be found by accessing the first score in my Score list and on and on. So that line of code is essentially iterating through those two lists and creating a Python dictionary of key:value pairs. I’m going to put a link at the bottom of this comment where you can see this visually!

But - to be honest what I did wasn’t that elegant. It works but a better, more concise way would be to instead make a dictionary from the get go vs. two list that I then create the dictionary with. That code would instead be:

text = top_100['Word']
sid = SentimentIntensityAnalyzer()
word_score_dict = {}

for word in text:
 ss = sid.polarity_scores(word)
 word_score_dict[word]=(ss['compound'])

      2. the issue I had with passing word_score_dict was it caused a ValueError: If using all scalar values, you must pass an index. When I did some searching I came across this:

      http://stackoverflow.com/questions/17839973/construct-pandas-dataframe-from-values-in-variables


      http://pythontutor.com/visualize.html#code=words%20%3D%20%5B%22happy%22,%20%22sad%22%5D%0Ascores%20%3D%20%5B0.57,-0.48%5D%0Aword_score_dict%20%3D%20%7B%7D%0A%0Afor%20i%20in%20range(len(words%29%29%3A%0A%20%20%20%20word_score_dict%5Bwords%5Bi%5D%5D%20%3D%20scores%5Bi%5D%0A%20%20%20%20%0Aprint(word_score_dict%29&cumulative=false&heapPrimitives=false&mode=edit&origin=opt-frontend.js&py=2&rawInputLstJSON=%5B%5D&textReferences=false

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Thanks for you post on tableau and python.Expecting some more articles from you blog.
    Tableau Training in Hyderabad

    ReplyDelete
  5. Hi Brit, which one is your calculated field? I couldn't find it in your workbook. Where did you store the following?

    #SCRIPT_REAL is a function in Tableau which returns a result from an external service script. It's in this function we pass the python code.

    SCRIPT_REAL("from nltk.sentiment import SentimentIntensityAnalyzer

    text = _arg1 #you have to use _arg1 to reference the data column you're analyzing, in this case [Word]. It gets word further down after the ,
    scores = [] #this is a python list where the scores will get stored
    sid = SentimentIntensityAnalyzer() #this is a class from the nltk (Natural Language Toolkit) library. We'll pass our words through this to return the score

    for word in text: # this loops through each row in the column you pass via _arg1; in this case [Word]
    ss = sid.polarity_scores(word) #passes the word through the sentiment analyzer to get the score
    scores.append(ss['compound']) #appends the score to the list of scores

    return scores #returns the scores
    "
    ,ATTR([Word]))

    ReplyDelete
  6. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly Contact MaxMunus
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 1,00,000 + trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh srivastava
    MaxMunus
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+918553576305
    www.MaxMunus.com


    ReplyDelete
  7. Thanks for sharing the information about the Tableauand keep updating us.This information is really useful

    ReplyDelete
  8. The blog gave me idea to use python for sentiment analysis My sincere thanks for sharing this post Thanking you
    Python Training in Chennai

    ReplyDelete
  9. Thank you so much for sharing this worth able content with us. The concept taken here will be useful for my future programs and i will surely implement them in my study. Keep blogging article like this.

    Python Training In Bangalore

    ReplyDelete
  10. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.

    Software Testing Training in chennai

    ReplyDelete
  11. Crisp.. I have decided to follow your blog so that I can myself updated.


    Software Testing Training in chennai

    ReplyDelete
  12. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me...
    Android training in chennai
    Ios training in chennai

    ReplyDelete

  13. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
    Web Design Development Company
    Mobile App Development Company

    ReplyDelete
  14. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
    Python Training in Chennai

    ReplyDelete
  15. Please click to play,if you wanna join casino online. Thank you
    gclub
    โกลเด้นสล็อต

    ReplyDelete
  16. Thanks for sharing this useful information. I read your blog completely.It is crispy to study. I gather lot of information about python with the help of this blog.
    Thanks for sharing..want more informaion about python.


    Python Online Training

    ReplyDelete
  17. Hi all dear!
    I like your pages and i would like to share this post with your collection.
    Thank you!!!

    จีคลับ
    goldenslot mobile

    ReplyDelete
  18. I think this tempale can use for free travel information blog review for blogger when write review.
    Hi admin
    Nice template
    Tôi có một website chuyên cung cấp các loại thiết bị trợ giảng cho giáo viên trong đó tiêu biểu là máy trợ giảng với chất lượng âm thanh hoàn hảo
    Ngoài ra bạn nên tham khảo thêm xe đẩy hàng của chúng tôi khi vận chuyển hàng hóa cùng với máy sấy tay giúp làm khô tay nhanh chóng

    ReplyDelete
  19. hello everyone.....
    thank the good topic.
    Welcome To Casino online Please Click the website
    thank you.
    ทางเข้าจีคลับ
    gclub casino
    goldenslot slots casino

    ReplyDelete
  20. Hi Brit,
    I think for this type of analysis, as you also said, it is a good idea to preprocess since it looks like data is not dynamic. But I was wondering..Thanks for sharing..,

    Python Online Training

    ReplyDelete
  21. Hi admin..,
    Very nice blog.I understand the concept you put it in the blog. you are put it very crizpy information. Thanks for sharing..


    Python Online Training

    ReplyDelete
  22. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..
    Android App Development Company

    ReplyDelete
  23. Really cool post, highly informative and professionally written and I am glad to be a visitor of this perfect blog, thank you for this rare info!


    Tableau Online Training

    ReplyDelete
  24. Very nice blog.I understand the concept you put it in the blog. you are put it very crizpy information. Thanks for sharing..

    goldenslot casino
    บาคาร่าออนไลน์
    gclub casino

    ReplyDelete
  25. Hello!! I'am glad to read the whole content of this blog and am very excited.Thank you.
    บาคาร่า
    gclub จีคลับ
    gclub casino

    ReplyDelete
  26. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...Thank you very much for this one.
    web design Company
    web development Company
    web design Company in chennai
    web development Company in chennai
    web design Company in India
    web development Company in India

    ReplyDelete
  27. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Tableau online training

    ReplyDelete
  28. it is really amazing...thanks for sharing....provide more useful information...
    Mobile app development company

    ReplyDelete
  29. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
    Fitness SMS
    Salon SMS
    Investor Relation SMS

    ReplyDelete
  30. Really it was an awesome article. Very useful & Informative
    Freshers Jobs in Chennai

    ReplyDelete
  31. The great service in this blog and the nice technology is visible in this blog. I am really very happy for the nice approach is visible in this blog and thank you very much for using the nice technology in this blog
    Data Science Online Training

    Hadoop Online Training

    ReplyDelete
  32. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    Tableau Online Training|
    SAS Online Training |
    R Programming Online Training|

    ReplyDelete
  33. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us.
    Python Training in Chennai | Best Python Training in Chennai | Big Data Analytics Training in Chennai

    ReplyDelete
  34. Nice it seems to be good post... It will get readers engagement on the article since readers engagement plays an vital role in every blog.i am expecting more updated posts from your hands.
    iOS App Development Company

    ReplyDelete
  35. This comment has been removed by the author.

    ReplyDelete
  36. Thanks for the information. The information you provided is very helpful for Tableau Learners. https://mindmajix.com/tableau-advanced-training

    ReplyDelete
  37. "Nice info!”Thanks for sharing great information.
    Tableau online training |Tableau online course

    ReplyDelete
  38. Hello,
    The Article on Using Python for Sentiment Analysis in Tableau is nice .It give detail information about Phython for Sentiment Analysis. data science consulting

    ReplyDelete
  39. Besant Technologies is a leading Python Training . we offer this course through online we have great experience in succeeding students through online courses. we can calculate our performance through their honest comments in our sites in supporting our services. we have referral program so candidates can earn money through referral. you can share your live experience with other can generate you some money.
    Selenium Training in Bangalore |
    Python Training in Bangalore |

    ReplyDelete
  40. Your blog is very useful for me.I really like you post.Thanks for sharing.

    ดูหนังผี

    ReplyDelete
  41. Very nice posting. Your article us quite informative. Thanks for the same. Our service also helps you to market your products with various marketing strategies, right from emails to social media. Whether you seek to increase ROI or drive higher efficiencies at lower costs, Pegasi Media Group is your committed partner will provide b2bleads.
    APPENDING SERVICES

    ReplyDelete
  42. I read the post and I have really enjoyed your blogs posts.looking for the next post.
    Digital Marketing Training In Bangalore.

    ReplyDelete
  43. Article is quite good. Pegasi Media is a b2b marketing firm that has worked with many top organizations. Availing its email list is fast, simple, convenient and efficient. Appending services adds the new record as well as fills up the fields that are missing. Pegasi Media Group also perform Data Refinement, Data building, Data Enchancement, and Data De-Duplication. Database marketing is a form of direct market in which the customers are contacted through their email addresses with the help of the database. There is a scope for email marketing to generate personalized communication with the clients in order to promote your sales.
    Big Data Users

    ReplyDelete
  44. great and nice blog thanks sharing..I just want to say that all the information you have given here is awesome...Thank you very much for this one.
    web design Company
    web development Company
    web design Company in chennai
    web development Company in chennai
    web design Company in India
    web development Company in India

    ReplyDelete
  45. REALLY GOOD! i like it so much<3 Thanks for the Good Artickle.
    sanadomino

    ReplyDelete
  46. Replies
    1. Nice article great post comment information thanks for sharing

      หนังจีน

      Delete
  47. Thank you for sharing valuable information. Nice post. I enjoyed reading this post.

    หนังจีน

    ReplyDelete
  48. Excellent article on the importance of R programming in tableau tool. I am working in the tableau related project. I gain some new updated regarding the tableau tool R Programming. Keep updating the recent updates of R. Thank you admin.

    Regards:

    R Programming Training in Chennai |
    R Training in Chennai

    ReplyDelete

Post a Comment

Leave a comment!

Popular posts from this blog

Open Data Sets

A connection of mine recently shared a great resource with me for those of you who are aspiring data scientist or just love data. It's an open-source data science program that can be found here: http://datasciencemasters.org/. Check out this great data repository compiled by the project: Open Data List of Public Datasets - user-curatedDBpedia - utilizing a large multi-domain ontologyPublic Data Sets on AWS - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more.Governmental Data Compendium of Governmental Open Data SourcesData.gov (USA)Africa Open DataUS Census - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more.Non-Governmental Org Data The World Bank - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Ban…

#MakeoverMonday: Data Science Degrees and Tile Maps

I have recently been experimenting with what I've seen being referred to as a tile map, grid map or periodic map. NPR did a great write up on traditional choropleth maps, cartograms and tile maps.
Some awesome Tableau folks have also done great tutorials and published these non-traditional map types publically including Brittany Fong, Matt Chambers and Kris Ericson. There are definitely instances where this type of map enhances the data view or enables better flow and certainly some where it won't be suitable (for example, showing data at the county level among others - example).

I came into this field from a non-traditional background like many others. There's definitely an emergence of new or rebranded data science degree and certificate programs. I was excited when I came across Dan Murray's article on the Interworks Blog that used data and an awesome tableau visualization to show programs throughout the U.S. Since I came across this at the same time that I was experi…