Setting up your Python environment, especially when you're new to coding, can be overwhelming. There are various options that aim to make the process easier. Anaconda, for example, is a free distribution that includes over "300 of the most popular Python packages." For many people, this is a great solution. For individuals who want control over which packages are installed, Miniconda might be a good option.
When I originally set up my environment, I installed all packages to the global Python—the one that came bundled with OS X—
site-packages directory. Toward the end of my first semester of graduate school, an installation of a new package broke an existing one. I had to restore (erase and reinstall OS X) my system.
The second time around, I used virtual environments, which isolate their associated Python packages into their own directories. If something goes wrong, the directory can simply be deleted. For more information on using
virtualenvwrapper check out this article from The Hitchhiker's Guide to Python.
In this post, I'll explain how to set up your Python environment in Mac OS X (currently 10.10.5) and how to use
There are a few things which must be installed prior to getting into Python. We'll need to install Xcode's command line tools, which installs the GNU Compiler Collection, as well as
homebrew, a package manager for OS X. Depending on the version of your OS, you might also need to install Xquartz. You'll find out when you try installing Python.
First, if you don't already have Xcode, you'll want to install it. For our purposes, we can simply install the command line tools, a much smaller download.
$ xcode-select --install
Next, we'll want to install
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
To ensure that it installed successfully, run the following.
$ brew doctor $ brew update
Also, make sure that
/usr/local/bin is at the top of your
PATH environment variable.
Now, we can install Python. If you're on OS X, you'll already have Python installed, but installing from
homebrew has many benefits. For a nice list, read this Hacker Codex post. Let's install both Python2 and Python3.
$ brew install python $ brew install python3
Let's check that our newly installed versions of Python point to the correct locations. The expected output is shown below each command. We'll do the same for
pip, which we'll be using for installing Python packages.
$ which python /usr/local/bin/python $ which python3 /usr/local/bin/python3 $ which pip /usr/local/bin/pip
$ pip install virtualenv $ pip install virtualenvwrapper
You'll also need to add a directory that will be the home for the virtual environments you decide to create. I created the
.virtualenvs, as shown below.
$ mkdir ~/.virtualenvs
You can name the directory whatever you like, but note that you'll have to replace
.virtualenvs with the name of your directory in the code that follows.
If you use
bash, which is the default shell in OS X, you'll need to add the following to your
~/.bashrc file. This let's
virtualenv know where to put the virtual environments.
export WORKON_HOME=$HOME/.virtualenvs source /usr/local/bin/virtualenvwrapper.sh
That's actually it for the set up! It seems like a lot, but this work up front will pay itself off down the line.
Recap. We installed: command line tools,
homebrew, Python, and
virtualenvwrapper. We also modified our
Creating Virtual Environments
Let's discuss how to create and use our virtual environments. To create a new one, use the following.
$ mkvirtualenv datascience
This has created a new folder inside of
datascience. It has also activated the virtual environment. This means that you have access to any Python packages associated with that environment—to see what's currently installed use
pip list—and that any package installs will be put in
$HOME is just a variable that contains the address of your home directory. In the terminal, run
echo $HOME to see your home directory path.)
Since we're interested in data science, let's install the following packages.
$ pip install numpy scipy matplotlib pandas scikit-learn statsmodels $ pip install jupyter $ pip install seaborn
If you then
pip list, you'll see all the above packages listed (along with some default ones). To exit the virtual environment, simply type
deactivate at the terminal.
Below, I show the process of creating a virtual environment, installing a Python package to it (and listing installed packages), and deactivating it.
Notice the active virtual environment name in the parentheses. (Note that your prompt might look different based on any customizations you may have done.) In this example,
pip list shows us that
wheel are installed—the last three are installed with each virtual environment instance by default.
You can create more than one virtual environment. In fact, that's the point. Virtual environments were created to help "keep the dependencies required by different projects in separate places." This is a common practice for developers. Data scientists might want to create virtual environments for specific tasks. For example, one for machine learning, one for web scraping, one for natural language processing, etc.
To keep track of the virtual environments you've created, use
lsvirtualenv -b. To activate an existing virtual environment, assuming you're not already in one, use
workon <name>. If you ever want to delete a virtual environment, just use
Let's say you're ready to start using Python 3, which is highly recommended. (If you're not convinced, watch Jake VanderPlas's SciPy 2015 keynote, "The State of the Stack.") If you want to create a virtual environment that uses Python 3, do the following.
$ mkvirtualenv -p python3 <name>
This new virtual environment will now use Python 3 by default. Everything else works as explained above.
That's it! Your Python environment is now set up such that you can more effectively control packages and how they interact.