sexta-feira, 8 de dezembro de 2017

Data science with Python: Turn your conditional loops to Numpy vectors

Tirthajyoti Sarkar em 05/12/2017 no site Medium

Resultado de imagem para python

Python is fast emerging as the de-facto programming language of choice for data scientists. But unlike R or Julia, it is a general purpose language and does not have a functional syntax to start analyzing and transforming numerical data right out of the box. So, it needs specialized library.
Numpy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis in Python ecosystem. It is the foundation on which nearly all of the higher-level tools such as Pandasand scikit-learn are built. TensorFlow uses NumPy arrays as the fundamental building block on top of which they built their Tensor objects and graphflow for deep learning tasks (which makes heavy use of linear algebra operations on a long list/vector/matrix of numbers).
Many Numpy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you’re performing. For data science and modern machine learning tasks, this is an invaluable advantage.
My recent story about demonstrating the advantage of Numpy-based vectorization of simple data transformation task caught some fancy and was well received by readers. There was some interesting discussion on the utility of vectorization over code simplicity and such.
Now, mathematical transformation based on some predefined condition are fairly common in data science tasks. And it turns out one can easily vectorize simple blocks of conditional loops by first turning them into functions and then using numpy.vectorize method. In my previous article I showed an order of magnitude speed boost for numpy vectorization of simple mathematical transformation. For the present case, the speedup is less dramatic, as the internal conditional looping is still somewhat inefficient. However, there is at least 20–50% improvement in the execution time over other plain vanilla Python codes.
Here is the simple code to demonstrate it:
import numpy as np
from math import sin as sn
import matplotlib.pyplot as plt
import time
# Number of test points
N_point = 1000
# Define a custom function with some if-else loops
def myfunc(x,y):
if (x>0.5*y and y<0.3):
return (sn(x-y))
elif (x<0.5*y):
return 0
elif (x>0.2*y):
return (2*sn(x+2*y))
else:
return (sn(y+x))
# List of stored elements, generated from a Normal distribution
lst_x = np.random.randn(N_point)
lst_y = np.random.randn(N_point)
lst_result = []
# Optional plots of the data
plt.hist(lst_x,bins=20)
plt.show()
plt.hist(lst_y,bins=20)
plt.show()
# First, plain vanilla for-loop
t1=time.time()
for i in range(len(lst_x)):
x = lst_x[i]
y= lst_y[i]
if (x>0.5*y and y<0.3):
lst_result.append(sn(x-y))
elif (x<0.5*y):
lst_result.append(0)
elif (x>0.2*y):
lst_result.append(2*sn(x+2*y))
else:
lst_result.append(sn(y+x))
t2=time.time()
print("\nTime taken by the plain vanilla for-loop\n----------------------------------------------\n{} us".format(1000000*(t2-t1)))
# List comprehension
print("\nTime taken by list comprehension and zip\n"+'-'*40)
%timeit lst_result = [myfunc(x,y) for x,y in zip(lst_x,lst_y)]
# Map() function
print("\nTime taken by map function\n"+'-'*40)
%timeit list(map(myfunc,lst_x,lst_y))
# Numpy.vectorize method
print("\nTime taken by numpy.vectorize method\n"+'-'*40)
vectfunc = np.vectorize(myfunc,otypes=[np.float],cache=False)
%timeit list(vectfunc(lst_x,lst_y))
# Results
Time taken by the plain vanilla for-loop
----------------------------------------------
2000.0934600830078 us
Time taken by list comprehension and zip
----------------------------------------
1000 loops, best of 3: 810 µs per loop
Time taken by map function
----------------------------------------
1000 loops, best of 3: 726 µs per loop
Time taken by numpy.vectorize method
----------------------------------------
1000 loops, best of 3: 516 µs per loop
Notice that I have used %timeit Jupyter magic command everywhere I could write the evaluated expression in one line. That way I am effectively running at least 1000 loops of the same expression and averaging the execution time to avoid any random effect. Consequently, if you run this whole script in a Jupyter notebook, you may slightly different result for the first case i.e. plain vanilla for-loop execution, but the next three should give very consistent trend (based on your computer hardware).
We see the evidence that, for this data transformation task based on a series of conditional checks, the vectorization approach using numpy routinely gives some 20–50% speedup compared to general Python methods.
It may not seem a dramatic improvement, but every bit of time saving adds up in a data science pipeline and pays back in the long run! If a data science job requires this transformation to happen a million times, that may result in a difference between 2 days and 8 hours.
In short, wherever you have a long list of data and need to perform some mathematical transformation over them, strongly consider turning those python data structures (list or tuples or dictionaries) into numpy.ndarrayobjects and using inherent vectorization capabilities.
Numpy provides a C-API for even faster code execution but it takes away the simplicity of Python programming. This Scipy lecture note shows all the related options you have in this regard.
There is an entire open-source, online book on this topic by a French neuroscience researcher. Check it out here.

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also you can check author’s GitHub repositoriesfor other fun code snippets in Python, R, or MATLAB and machine learning resources. If you are, like me, passionate about machine learning/data science/semiconductors, please feel free to add me on LinkedIn.

57 comentários:

  1. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression

    ResponderExcluir
  2. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

    ResponderExcluir
  3. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.

    Simple Linear Regression

    Correlation vs Covariance

    ResponderExcluir
  4. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ResponderExcluir
  5. Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple Linear Regression
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ResponderExcluir
  6. Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data sciecne course in hyderabad

    ResponderExcluir
  7. Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...data science courses

    ResponderExcluir
  8. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.

    Simple Linear Regression

    Correlation vs Covariance

    ResponderExcluir
  9. I must say you are very much concise and experienced at persuasive writing. I just loved your flair of writing.
    Data Science training in Mumbai
    Data Science course in Mumbai
    SAP training in Mumbai

    ResponderExcluir
  10. The content is well acknowledged, so no one could allege that it is just one person's opinion yet it covers and justifies all the applicable points. I have read such a startling work after a long time!
    Data Science training in Mumbai
    Data Science course in Mumbai
    SAP training in Mumbai

    ResponderExcluir
  11. This is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the leisure here! Keep up the good work. I have been meaning to write something like this on my website and you have given me an idea.data scientist training in hyderabad

    ResponderExcluir
  12. Fantastic blog extremely good well enjoyed with the incredible informative content which surely activates the learners to gain the enough knowledge. Which in turn makes the readers to explore themselves and involve deeply in to the subject. Wish you to dispatch the similar content successively in future as well.

    data science certification in bangalore

    ResponderExcluir
  13. Thanks for posting the best information and the blog is very important.data science interview questions and answers

    ResponderExcluir
  14. I surely acquiring more difficulties from each surprisingly more little bit of it
    data scientist training and placement

    ResponderExcluir
  15. Honestly speaking this blog is absolutely amazing in learning the subject that is building up the knowledge of every individual and enlarging to develop the skills which can be applied in to practical one. Finally, thanking the blogger to launch more further too.

    data science course in bangalore with placement

    ResponderExcluir
  16. Highly appreciable regarding the uniqueness of the content. This perhaps makes the readers feels excited to get stick to the subject. Certainly, the learners would thank the blogger to come up with the innovative content which keeps the readers to be up to date to stand by the competition. Once again nice blog keep it up and keep sharing the content as always.

    Data Science Course in Bhilai

    ResponderExcluir
  17. Highly appreciable regarding the uniqueness of the content. This perhaps makes the readers feels excited to get stick to the subject. Certainly, the learners would thank the blogger to come up with the innovative content which keeps the readers to be up to date to stand by the competition. Once again nice blog keep it up and keep sharing the content as always.

    Data Science Course in Bhilai

    ResponderExcluir
  18. This is a great article thanks for sharing this informative information. I will visit your blog regularly for some latest posts. I will visit your blog regularly for Some latest posts.
    data scientist course in hyderabad

    ResponderExcluir
  19. Great to become visiting your weblog once more, it has been a very long time for me. Pleasantly this article i've been sat tight for such a long time. I will require this post to add up to my task in the school, and it has identical subject along with your review. Much appreciated, great offer. data science course in nagpur

    ResponderExcluir
  20. Learn an in-depth, real-time understanding of the Data Science domain by enrolling for the AI Patasala advanced Data Science Course in Hyderabad.
    Data Scientist Training in Hyderabad

    ResponderExcluir
  21. This is very useful post for me. This will absolutely going to help me in my project.
    data science training in malaysia

    ResponderExcluir
  22. Your content is nothing short of brilliant in many ways. I think this is engaging and eye-opening material. Thank you so much for caring about your content and your readers.
    data analytics courses in hyderabad

    ResponderExcluir
  23. Thanks for posting the best information and the blog is very good.data science course in ranchi

    ResponderExcluir
  24. Thanks for posting the best information and the blog is very good.data analytics course in rajkot

    ResponderExcluir
  25. Thanks for posting the best information and the blog is very good.data science course in udaipur

    ResponderExcluir
  26. Thanks for posting the best information and the blog is very good.data science training in ranchi

    ResponderExcluir
  27. Thanks for posting the best information and the blog is very good.business analytics course in rajkot

    ResponderExcluir
  28. Thanks for posting the best information and the blog is very good.business analytics course in ranchi

    ResponderExcluir
  29. Thanks for posting the best information and the blog is very good.

    ResponderExcluir
  30. Thanks for posting the best information and the blog is very good.data science training in udaipur

    ResponderExcluir
  31. Thanks for posting the best information and the blog is very good.business analytics course in udaipur

    ResponderExcluir
  32. Thanks for posting the best information and the blog is very good.

    ResponderExcluir
  33. Good to visit your weblog again, it has been months for me. Nicely this article that i've been waiting for so long. I will need this post to total my assignment in the college, and it has the exact same topic together with your write-up. Thanks, good share.
    data science training institute in hyderabad

    ResponderExcluir
  34. Hi buddies, it is a great written piece entirely defined, continue the good work constantly.
    cyber security course malaysia

    ResponderExcluir
  35. I'm always looking online for articles that can help me. I think you also made some good comments on the functions. Keep up the good work!
    data science training in mangalore

    ResponderExcluir
  36. This is a great post I saw thanks to sharing. This is really what I wanted to see, I hope they continue to share such a great article in the future.
    data science certification in mangalore

    ResponderExcluir
  37. Very informative message! There is so much information here that can help any business start a successful social media campaign!
    data science training in london

    ResponderExcluir
  38. Very informative message! There is so much information here that can help any business start a successful social media campaign!
    data science training in london

    ResponderExcluir
  39. Thanks for sharing this great article we appreciate it, we provide instagram reels download freely and unlimited.

    ResponderExcluir
  40. Thanks for sharing this great article we appreciate it, we provide instagram reels download freely and unlimited.

    ResponderExcluir
  41. Thanks for sharing this great article we appreciate it, we provide instagram reels download freely and unlimited.

    ResponderExcluir
  42. You re in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!data science training in roorkee

    ResponderExcluir
  43. Wow, what great information on World Day, your exceptionally nice educational article. a debt of gratitude is owed for the position.
    data science training in mangalore

    ResponderExcluir
  44. あなたのライティングスキルは素晴らしかったです。記事をポイントごとに簡単に説明していただきました。本当に役に立ちました。貴重な記事を共有していただきありがとうございます。

    インスタグラムリールのダウンロード

    ResponderExcluir
  45. Learn to perform Data Mining, Data Cleansing, Data Exploring, Feature Engineering, Prediction Model, and Data Visualization with the Data analytics coaching in Bangalore. Learn to extract business-focused insights from data with the help of mathematics and statistics. Hone your skills with the combined pedagogy approach in classrooms and extensive student-faculty interaction that helps identify students for our internship program giving you the feel of a real-world professional environment.
    data analyst course in bangalore with placement

    ResponderExcluir
  46. Gain mastery over the core principles of data analytics and get ready to work with top companies. Get acquainted with the bright and exciting future of data science by enrolling in the best data analytics institute in Bangalore. Learn to empower more meaningful business decisions by representing data with tools of visualization.data analyst course in bangalore

    ResponderExcluir
  47. Gain mastery over the core principles of data analytics and get ready to work with top companies. Get acquainted with the bright and exciting future of data science by enrolling in the best data analytics institute in Bangalore. Learn to empower more meaningful business decisions by representing data with tools of visualization.data analyst course in bangalore

    ResponderExcluir
  48. Gain mastery over the core principles of data analytics and get ready to work with top companies. Get acquainted with the bright and exciting future of data science by enrolling in the best data analytics institute in Bangalore. Learn to empower more meaningful business decisions by representing data with tools of visualization.data analyst course in bangalore

    ResponderExcluir
  49. Gain mastery over the core principles of data analytics and get ready to work with top companies. Get acquainted with the bright and exciting future of data science by enrolling in the best data analytics institute in Bangalore. Learn to empower more meaningful business decisions by representing data with tools of visualization.data analyst course in bangalore

    ResponderExcluir
  50. Data analyst handles structured and unstructured and data that is generated at an unprecedented rate every day. Anyone with a strong statistical background and an analytical mindset enjoys the challenges of big data that involves building data models and software platforms along with creating attractive visualizations and machine learning algorithms. Sign up for the Data Science courses in chennai with Placements and get access to resume building and mock interviews that will help you get placed with top brands in this field.
    data analyst course in chennai

    ResponderExcluir
  51. Enroll in the Data Science course near me to learn the handling of huge amounts of data by analyzing it with the help of analytical tools. This field offers ample job profiles to work as a Data Architect, Data Administrator, Data Analyst, Business Analyst, Data Manager, and BI Manager. Step into an exciting career in the field of Data Science and achieve great heights by acquiring the right knowledge and skills to formulate solutions to business problems.
    data analyst course in bangalore

    ResponderExcluir
  52. Enroll in the Data Science course near me to learn the handling of huge amounts of data by analyzing it with the help of analytical tools. This field offers ample job profiles to work as a Data Architect, Data Administrator, Data Analyst, Business Analyst, Data Manager, and BI Manager. Step into an exciting career in the field of Data Science and achieve great heights by acquiring the right knowledge and skills to formulate solutions to business problems.
    data analyst course in bangalore

    ResponderExcluir