在Jupyter中安裝Python包

摘要：對於任何想要學習程式設計語言的小夥伴來說，選擇程式設計的工具是非常重要的。在程式設計工具和Python庫中的連接中，一直是一個讓很多人頭疼的問題。

對於使用Jupyter notebook的戶來說，你會經常遇到下面的問題：

我安裝了套裝軟體X，現在我無法將其導入到notebook中。幫幫我！

這個問題幾乎是所有初學者第一個攔路虎，任何語言都是如此。今天我們就來說說Jupyter notebook如何解決這類問題。

從根本上來說，這個問題的根是Jupyter內核與Jupyter的shell分離的事實，換句話說，安裝程式與筆記本中預設使用的是不同的Python版本。在最簡單的情況下，

這個問題不會出現，但是當調試代碼時，需要瞭解作業系統的複雜性、Python套裝軟體安裝的複雜性以及Jupyter本身的複雜性。

在瞭解了一些線上（A， B）和一些關於這個話題的討論，我決定在這裡深入討論這個問題。這篇文章將解決一些事情：

·首先，我為一般問題提供一個快速，簡單的答案，例如我如何安裝一個Python包，以便使用pip或conda與我的jupyter筆記本一起工作。

·其次，我將深入到Jupyter筆記本抽象是幹什麼的，如何將其與作業系統的複雜交互簡單化。

·第三，我將討論一些我在社區的想法，其中包括Jupyter， Pip和Conda開發人員可能考慮的一些變化，以減輕用戶的認知負擔。

本文將重點介紹兩種安裝Python套裝軟體的方法：pip和conda。

1.如何在Jupyter中安裝套裝軟體

Pip和conda

對於許多用戶來說，

pip和conda之間的選擇可能是一個令人困惑的選擇，我總結了兩者之間的本質區別在於：

Pip可以在任何環境下安裝python套裝軟體。

conda在conda環境中安裝任何套裝軟體。

·如果您使用Anaconda conda安裝Python ，則使用安裝Python套裝軟體。如果conda告訴你所需的套裝軟體不存在，那麼你必須使用pip。

即使你在短期內可以解決問題，也可能會出現長期的問題。例如，如果pip install給你一個許可權錯誤，這可能意味著你正在試圖在系統中安裝/更新python套裝軟體，比如/usr/bin/python。這樣做會產生不好的後果，因為作業系統本身通常依賴於Python安裝中的特定版本。對於Python的日常使用，你應該使用虛擬環境或Anaconda把你的套裝軟體與系統Python隔離。

1.1：如何使用Conda在Jupyter中

如果您使用的是jupyter，並且想要使用conda安裝套裝軟體，則可能會使用!記號直接從Jupyter上運行conda作為shell命令：

# DON'T DO THIS! !conda install --yes numpy Fetching package metadata ........... Solving package specifications: . # All requested packages already installed. # packages in environment at /Users/jakevdp/anaconda/envs/python3.6: # numpy 1.13.3 py36h2cdce51_0

我將在下面更全面地概述，如果您想從當前的jupyter中使用這些已安裝的套裝軟體。

這是一個在一般情況下出現的對話：

# Install a conda package in the current Jupyter kernel import sys !conda install --yes --prefix {sys.prefix} numpy Fetching package metadata ........... Solving package specifications: . # All requested packages already installed. # packages in environment at /Users/jakevdp/anaconda: # numpy 1.13.3 py36h2cdce51_0

這個方法使得conda在當前運行的Jupyter內核中安裝套裝軟體。

1.2：如何使用pip在Jupyter中

如果您使用的是Jupyter，並想安裝一個套裝軟體pip，您可能會傾向於直接運行pip：

# DON'T DO THIS !pip install numpy Requirement already satisfied: numpy in /Users/jakevdp/anaconda/envs/python3.6/lib/python3.6/site-packages

如果您想從當前的jupyter中使用這些已安裝的套裝軟體。

# Install a pip package in the current Jupyter kernel import sys !{sys.executable} -m pip install numpy Requirement already satisfied: numpy in /Users/jakevdp/anaconda/lib/python3.6/s

如果你想要在Jupyter中直接使用，指令應該是：

$ python -m pip install

而不是：

$ pip install

因為前者更明確地說明了套裝軟體的安裝位置（下面會詳細介紹）。

2.為什麼Jupyter的安裝如此混亂？

上述的方案應該在所有情況下都能正常工作，但為什麼還需要額外的方法？這是因為在Jupyter中， shell環境和Python可執行檔是斷開的。想要深入理解理解為什麼，

你就必須要對以下的概念有瞭解：

您的作業系統如何查找可執行程式。

Python如何安裝和查找套裝軟體。

Jupyter如何決定使用哪個Python可執行檔。

注意：下面的討論假設作業系統是Linux， Unix， MacOSX。

2.1您的作業系統如何定位可執行檔？

當您正在使用的終端輸入如下命令python， jupyter， ipython， pip， conda，你的作業系統包含一個定義良好的機制，他可以找到可執行檔的名稱。

在Linux和Mac系統上，系統將首先檢查與命令匹配的別名，如果失敗，則引用$PATH
環境變數：

!echo $PATH /Users/jakevdp/anaconda/envs/python3.6/bin:/Users/jakevdp/anaconda/envs/python3.

$PATH列出目錄，按順序，將搜索任何可執行檔：例如，如果我python在上面鍵入我的系統$PATH，它將首先查找/Users/jakevdp/anaconda/envs/python3.6/bin/python，如果不存在，它將查找/Users/jakevdp/anaconda/bin/python，依此類推。

2.2Python如何查找包

Python使用類似的機制來定位導入的包。 Python在導入時搜索的路徑清單位於：

預設情況下， Python查找模組的第一個地方是一個空路徑，

即當前的工作目錄。如果沒有找到該模組，則將它放在位置列表中，直到找到該模組。您可以使用__path__導入的模組的屬性找出哪個位置已被使用：

import numpy numpy.__path__ ['/Users/jakevdp/anaconda/lib/python3.6/site-packages/numpy']

在大多數情況下，你安裝了Python包pip或conda將被放置在一個名為目錄site-packages。要認識到的重要一點是每個Python可執行檔都有自己的site-packages。這意味著當你安裝一個套裝軟體時，它與特定的python可執行檔相關聯，並且默認只能用於Python安裝。

我們可以列印sys.path每個可用python可執行檔的變數來看到這一點，使用Jupyter令人愉快的是將Python和bash命令混合在一個代碼塊中的功能：

paths = !type -a python for path in set(paths): path = path.split()[-1] print(path) !{path} -c "import sys; print(sys.path)" print()

這裡的全部細節並不是特別重要，但是需要強調的是，每個Python可執行檔都有自己獨特的路徑，除非您修改sys.path，否則不能導入安裝在不同Python環境中的套裝軟體。

我將再次強調：Jupyter中的shell環境必須與啟動它的Python版本相匹配。

2.3： Jupyter如何執行代碼：Jupyter內核

下一個相關的問題是Jupyter如何選擇執行Python代碼，這使我們想到了Jupyter內核的概念。

Jupyter內核是指Jupyter在內執行代碼的一系列檔。對於Python內核，這將指向一個特定的Python版本，但Jupyter被設計得更通用：Jupyter有幾十個可用的內核，包括Python 2，Python 3，Julia，R，Ruby，Haskell，甚至C ++和Fortran。

如果您使用Jupyter，則可以隨時使用內核→選擇內核功能表項目來更改內核。

要查看您的系統上可用的內核，可以在shell中運行以下命令：

!jupyter kernelspec list Available kernels: python3 /Users/jakevdp/anaconda/envs/python3.6/lib/python3.6/site-packages/ipykernel/resources conda-root /Users/jakevdp/Library/Jupyter/kernels/conda-root python2.7 /Users/jakevdp/Library/Jupyter/kernels/python2.7 python3.5 /Users/jakevdp/Library/Jupyter/kernels/python3.5 python3.6 /Users/jakevdp/Library/Jupyter/kernels/python3.6

這些列出的內核中的每一個都是一個包含名為kernel.json的檔的目錄，其中指定了內核應該使用哪種語言和可執行檔。例如：

!cat /Users/jakevdp/Library/Jupyter/kernels/conda-root/kernel.json { "argv": [ "/Users/jakevdp/anaconda/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "python (conda-root)", "language": "python" }

如果你想創建一個新的內核，可以使用jupyter ipykernel命令來完成。例如，我使用以下內容作為範本，為我的conda環境創建了上述內核：

$ source activate myenv $ python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

3.一些建議

所以，綜上所述，安裝在Jupyter的包是從根本上說Jupyter的shell環境和Python的內核不匹配，這意味著你必須做的不是簡單地多瞭解pip install或conda install做事情的工作。

我有一些想法，其中一些可能是有用的：

3.1：Jupyter 的潛在策略

正如我所提到的，根本問題是Jupyter的shell環境和計算內核之間的不匹配。那麼，我們是否可以按照內核規範來強制二者匹配呢？

也許，這個github問題展示了一種在內核啟動時修改shell變數的方法。

基本上，在你的內核目錄中，你可以添加一個如下所示的腳本kernel-startup.sh（並確保你改變了許可權以便它是可執行的）：

#!/usr/bin/env bash # activate anaconda env source activate myenv # this is the critical part, and should be at the end of your script: exec python -m ipykernel $@

3.2新的Jupyter Magic函數

我們可以通過在Jupyter中引入%pip和%conda魔術功能來簡化用戶體驗，從而檢測當前的內核並使某些套裝軟體安裝在正確的位置。

PIP Magic

例如，下面是如何定義一個%pip在當前內核中工作的魔術函數：

from IPython.core.magic import register_line_magic @register_line_magic def pip(args): """Use pip from the current kernel""" from pip import main main(args.split())

請注意，Jupyter開發者Matthias Bussonnier已經在他的pip_magic倉庫中發佈了基本的內容，所以你可以做的是：

$ python -m pip install pip_magic

Conda Magic

同樣，我們可以定義一個conda Magic，如果你輸入的話會做正確的事情%conda install XXX。這比pip Magic更有意義，因為它必須首先確認環境是conda相容的，然後（與缺少有關的python -m conda install）必須調用一個子進程來執行相應的shell命令：

from IPython.core.magic import register_line_magicimport sysimport osfrom subprocess import Popen, PIPEdef is_conda_environment(): """Return True if the current Python executable is in a conda env""" # TODO: make this work with Conda.exe in Windows conda_exec = os.path.join(os.path.dirname(sys.executable), 'conda') conda_history = os.path.join(sys.prefix, 'conda-meta', 'history') return os.path.exists(conda_exec) and os.path.exists(conda_history)@register_line_magicdef conda(args): """Use conda from the current kernel""" # TODO: make this work with Conda.exe in Windows # TODO: fix string encoding to work with Python 2 if not is_conda_environment(): raise ValueError("The python kernel does not appear to be a conda environment. " "Please use ``%pip install`` instead.") conda_executable = os.path.join(os.path.dirname(sys.executable), 'conda') args = [conda_executable] + args.split() # Add --prefix to point conda installation to the current environment if args[1] in ['install', 'update', 'upgrade', 'remove', 'uninstall', 'list']: if '-p' not in args and '--prefix' not in args: args.insert(2, '--prefix') args.insert(3, sys.prefix) # Because the notebook does not allow us to respond "yes" during the # installation, we need to insert --yes in the argument list for some commands if args[1] in ['install', 'update', 'upgrade', 'remove', 'uninstall', 'create']: if '-y' not in args and '--yes' not in args: args.insert(2, '--yes') # Call conda from command line with subprocess & send results to stdout & stderr with Popen(args, stdout=PIPE, stderr=PIPE) as process: # Read stdout character by character, as it includes real-time progress updates for c in iter(lambda: process.stdout.read(1), b''): sys.stdout.write(c.decode(sys.stdout.encoding)) # Read stderr line by line, because real-time does not matter for line in iter(process.stderr.readline, b''): sys.stderr.write(line.decode(sys.stderr.encoding))

在提出了今天可以使用的一些簡單解決方案之後，我詳細解釋了為什麼這些解決方案是必要的：歸結起來，在Jupyter中，內核與外殼斷開連接。內核環境可以在運行時更改，而shell環境是在筆記本啟動時確定的。

最後：對於創建Python資料科學生態系統基礎的Jupyter，conda，pip和相關工具的開發人員忠心的感謝。這篇文章寫在一個Jupyter筆記本裡面。您可以查看靜態版本在這裡或下載完整的在這裡。

阿裡云云棲社區組織翻譯。

文章原標題《installing-python-packages-from-jupyter》，作者：Jake VanderPlas.

個人博客：http://jakevdp.github.io/pages/about.html ，Python Data Science Handbook的作者。

其博客位址，可以免費閱讀本書。

譯者：虎說八道，審閱：