Article From:https://www.cnblogs.com/yunqishequ/p/9968183.html

 

a4bb21df6b35583f4a8cbe5f0c713e1ae01ebece

PythonIt’s a magical language. In fact, it is one of the fastest-growing programming languages in the world in recent years. It has proved its practicability in various fields of development and data science. The entire Python system and libraries are for users around the world, whether beginners or advanced.A proper choice. One of the reasons for its success and popularity is its powerful libraries, which make it dynamic and fast.

In this article, we will see some Python libraries for data science tasks other than pandas, scikit-learning, and matplotlib. Even when you see libraries like pandas, scikit-learns, it’s brain-blowing.Machine learning tasks emerge, but it’s always helpful to understand and learn about other Python libraries in this area.

1、Wget

Extracting data from web pages is one of the important tasks of data scientists. Wget is a free, non-interactive utility for downloading files from the Internet. It supports HTTP, HTTPS and FTP protocols, as well as retrieval through HTTP proxy. Because it’s non-interactive, even if the user doesn’tLog in or work in the background. So next time you want to download a picture of a website or page, WGet can help you.

Installation:

$ pip install wget

 

Example:

import wget url 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3' filename 
=wget.download(url100[................................................3841532 3841532
filename 'razorback.mp3'

 

2、Pendulum

Pendulum can be used for those frustrated with the use of date and time in python. It’s a python package that eases date-time operations and is a python prototype replacement. For further information, please refer to this document.

 

Installation:

$ pip install pendulum

Example:

import pendulum dt_toronto = pendulum.datetime(201211, tz='America/Toronto') dt_vancouver 
=pendulum.datetime(201211, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours()3

 

3、imbalanced-learn

 

I’ve seen most classification algorithms work, with almost the same number of samples per class, such as balanced. But in real life, most of the data sets are unbalanced, which will affect the prediction of learning stage and subsequent machine learning algorithms. Fortunately, this imbalance was createdD library solves this problem. It is compatible with scikit-learning and is part of the scikit-learning-contrib project. Next time you encounter an unbalanced data set, you can try using this library.

 

Installation:

pip install -U imbalanced-learn #or conda install -c conda-forge imbalanced-learn

 

Example:

 

Refer to the documentation for usage and examples.

4、FlashText

NLPCleaning up text data in tasks often requires changing keywords in sentences or extracting keywords from sentences. Normally, these operations can be done with regular expressions, but if you encounter thousands of searches, it can be a problem. Python’s FlashText module, the module baseThe FlashText algorithm provides appropriate substitution and so on. The best part of FlashText is that the run time is independent of the number of search terms, and you can learn more here.

Installation:

$ pip install flashtext

 

Example:

Extracting keywords

from flashtext import KeywordProcessor keyword_processor = KeywordProcessor(#
keyword_processor.add_keyword(<unclean name>, <standardised name>)
keyword_processor.add_keyword('Big Apple''New York') keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
keywords_found ['New York''Bay Area']

Substitute keywords

keyword_processor.add_keyword('New Delhi''NCR region') new_sentence =
keyword_processor.replace_keywords('I love Big Apple and new delhi.') new_sentence 'I love New
York and NCR region.'

5、Fuzzywuzzy

 

This name sounds strange, but when dealing with string matching, fuzzy wuzzy is a very useful library that can easily implement operations, such as string comparison ratio, token ratio, etc. It also facilitates matching records stored in different databases.

 

Installation:

$ pip install fuzzywuzzy

 

Example:

from fuzzywuzzy import fuzz from fuzzywuzzy import process # Simple Ratio
fuzz.ratio("this is a test""this is a test!"97 # Partial Ratio 
fuzz.partial_ratio("this is a test""this is a test!"100

 

More interesting examples can be found in GitHub repo.

6、PyFlux

Time series analysis is one of the most common problems in machine learning. PyFlux is an open source library for time series problems in Python. The database has a good modern time series model including but not limited to ARIMA, GARCH and VAR models. In short,PyFlux provides a probabilistic method for time series modeling, which is worth trying.

 

Installation:

pip install pyflux

 

Example:

Refer to the relevant documentation for examples of usage.

7、Ipyvolume

 

Visualization of results is an important aspect of data science. Being able to visualize results has great advantages. IPyvolume is a Python library that visualizes 3D volumes and fonts (for example, 3d) in Jupyter notebook with minimal configuration and effortScatter plot). However, it is currently in the pre-1.0 stage. A good analogy is that IPyvolume’s volshow is a 3D array and matplotlib’s imshow is a 2D array. You can read more about it here.

Installation:

Using pip $ pip install ipyvolume Conda/Anaconda $ conda install -c conda-forge ipyvolume

 

Example:

animation

6a0cce1e94c6999351a7ba955300b82886411c69

 

volume rendering

1eba1c050093e53cc7d9502816f8e3480919408b

8、Dash

DashIt is an efficient Python framework for building Web applications. It’s written on Flask, Plotly. JS and React. js, and links existing UI elements (such as drop-down lists, sliders and graphics) to your analysis Python code without having toUse javascript. Dash is ideal for building data visualization applications that can then be presented in Web browsers. User guides are available here.

 

install

pip install dash==0.29.# The core dash backend pip install dash-html-components==0.13.
# HTML components pip install dash-core-components==0.36.# Supercharged components pip install dash-table==3.1.
# Interactive DataTable component (new!)

 

Example

The following example shows the highly interactive graphics of the drop-down table. When the user selects a value in the drop-down list, the application code dynamically exports data from Google Finance to Pandas DataFram.

 

source code

defc79467fcd1bbfea13dc6fbc6c7b5dde43531b

9、Gym

OpenAIGym is a toolkit for developing and comparing reinforcement learning algorithms. It is compatible with any numerical library, such as TensorFlow or Theano. Gym libraries are a necessary set of test problems, also known as environments – you can use them to train reinforcement learning algorithms. These environmental toolsThere is a shared interface that allows the writing of general algorithms.

 

install

pip install gym

 

Example

Following is an example of 1000 steps in the runtime environment CartPole-v0, rendering the environment at each step.

You can learn more about the environment here.

conclusion

These are Python libraries that I chose to be useful for data science, not common ones such as numpy, pandas, etc. If you know of other libraries that can be added to the list, please mention them in the comments below. Don’t forget to try.

Original link
This article is the original content of Yunqi Community, which can not be reproduced without permission.

Leave a Reply

Your email address will not be published. Required fields are marked *