Thursday, March 14, 2019

Understanding LSTM Networks and Its Diagrams



Recurrent Neural Networks

Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

Recurrent Neural Networks have loops.

In the above diagram, a chunk of neural network, A, looks at some input xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:

An unrolled recurrent neural network.
An unrolled recurrent neural network.
This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning.The list goes on. I’ll leave discussion of the amazing feats one can achieve with RNNs to Andrej Karpathy’s excellent blog post, The Unreasonable Effectiveness of Recurrent Neural Networks. But they really are pretty amazing. 

Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs that this essay will explore.

The Problem of Long-Term Dependencies

One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.



But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in India… I speak fluent Hindi.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of Hindi, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.
Neural networks struggle with long term dependencies.

In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them. The problem was explored in depth by Hochreiter (1991) [German] and Bengio, et al. (1994), who found some pretty fundamental reasons why it might be difficult.

Thankfully, LSTMs don’t have this problem!

LSTM Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

The repeating module in a standard RNN contains a single layer.
LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

A LSTM neural network.
The repeating module in an LSTM contains four interacting layers.
Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step by step later. For now, let’s just try to get comfortable with the notation we’ll be using.


In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations.

The Core Idea Behind LSTMs

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.


The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.


The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

An LSTM has three of these gates, to protect and control the cell state.

Step-by-Step LSTM Walk Through

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht-1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct-1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.”

Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In such a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject.


The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C~t, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.

In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting.


It’s now time to update the old cell state, Ct-1, into the new cell state Ct. The previous steps already decided what to do, we just need to actually do it.

We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it*C~t. This is the new candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps.



Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between -1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next.



Variants on Long Short Term Memory

What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the above. In fact, it seems like almost every paper involving LSTMs uses a slightly different version. The differences are minor, but it’s worth mentioning some of them.

One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole connections.” This means that we let the gate layers look at the cell state.



The above diagram adds peepholes to all the gates, but many papers will give some peepholes and not others.

Another variation is to use coupled forget and input gates. Instead of separately deciding what to forget and what we should add new information to, we make those decisions together. We only forget when we’re going to input something in its place. We only input new values to the state when we forget something older.



A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by Cho, et al. (2014). It combines the forget and input gates into a single “update gate.” It also merges the cell state and hidden state, and makes some other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly popular.

A gated recurrent unit neural network.

These are only a few of the most notable LSTM variants. There are lots of others, like Depth Gated RNNs by Yao, et al. (2015). There’s also some completely different approach to tackling long-term dependencies, like Clockwork RNNs by Koutnik, et al. (2014).

Which of these variants is best? Do the differences matter? Greff, et al. (2015) do a nice comparison of popular variants, finding that they’re all about the same. Jozefowicz, et al. (2015) tested more than ten thousand RNN architectures, finding some that worked better than LSTMs on certain tasks.



Sunday, February 10, 2019

Boot Up for Jupyter Notebooks

In this one, we are gonna take a look at some quite useful add-ons on top of the ones discussed earlier. These have been accumulated from various other sources I have been going through.

More Extensions

Notebook extensions let you move beyond the general vanilla way of using the Jupyter Notebooks. Notebook extensions (or nbextensions) are JavaScript modules that you can load on most of the views in your Notebook’s frontend. These extensions modify the user experience and interface.
Let us discuss some of the useful extensions that have been missed out in the last article.
1. Hinterland
Hinterland enables code autocompletion menu for every keypress in a code cell, instead of only calling it with the tab. This makes Jupyter notebook’s autocompletion behave like other popular IDEs such as PyCharm and VCS.
AWESOME!

 
2. Split Cells Notebook
This extension splits the cells of the notebook and places then adjacent to each other. These will be of great use for comparing things for a presentation.
3. Snippets
With this you get a drop-down menu to the Notebook toolbar that allows easy insertion of code snippet cells from many popular libraries into the current notebook.
4. Collapsible Headings
Collapsible Headings allows the notebook to have collapsible sections, separated by headings. So in case you have a lot of dirty code in your notebook, you can simply collapse it to avoid scrolling it again and again.
DOPE!!


Slideshow

Notebooks are an effective tool for teaching and writing explainable codes. Jupyter Notebooks can be easily converted to slides and we can easily choose what to show and what to hide from the notebooks.
NO MORE POWERPOINT!!
This can be done in two ways:
1. Jupyter’s inbuilt Slide
In your notebook navigate to View → Cell Toolbar → Slideshow. A light grey bar appears on top of each cell, and you can customize the slides.
Now go to the directory where the notebook is present and enter the following code:
jupyter nbconvert *.ipynb --to slides --post serve
# insert your notebook name instead of *.ipynb
The slides get displayed at port 8000. Also, a .html file will be generated in the directory, and you can also access the slides from there.
Here, you can see the code but cannot edit it. RISE plugin offers a solution.
2. RISE plugin
This I have already mentioned in my last article. RISE is an acronym for Reveal.js — Jupyter/IPython Slideshow Extension. It utilized the reveal.jsto run the slideshow. This is super useful since it also gives the ability to run the code without having to exit the slideshow. Install with:
#Using conda
conda install -c conda-forge rise
#Using pip
pip install RISE
Install the JS and CSS in the proper places:
jupyter-nbextension install rise --py --sys-prefix
#enable the nbextension:jupyter-nbextension enable rise --py --sys-prefix
Now we notice a new extension that says “Enter/Exit RISE Slideshow.”
Click on it. You get your interactive slides. Awesome!

Jupyter Widgets

Jupyter Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc. Widgets can be used to build interactive GUIs for the notebooks. Install them with:
# pip
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
# Condaconda install -c conda-forge ipywidgets
#Installing ipywidgets with conda automatically enables the extension
For complete details, you can visit their Github repository. For now, let’s have a look at some of the widgets.
  1. Interact
It is the easiest way to get started using IPython’s widgets. The interactfunction (ipywidgets.interact) automatically creates a user interface (UI) controls for exploring code and data interactively.
There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. The ideal solution to this issue would be interactive controls to change inputs without needing to rewrite or rerun code. We’ll see how to get started with IPython widgets ( ipywidgets), interactive controls you can build with one line of code. This library allows us to turn Jupyter Notebooks from static documents into interactive dashboards, perfect for exploring and visualizing data.
IPython widgets, unfortunately, do not render on GitHub or nbviewer but you can still access the notebook and run locally.
# Start with some imports!
from ipywidgets import interact, interact_manual
import ipywidgets as widgets
def f(x):
    return x
# Generate a slider 
interact(f, x=10,);
# Booleans generate check-boxes
interact(f, x=True);
# Strings generate text areas
interact(f, x='Hi there!');
We can use this @interact decorator to quickly turn any ordinary function into an interactive widget.
@interact
def show_articles_more_than(column='claps', x=5000):
    return df.loc[df[column] > x]
Source: https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6
import os
from IPython.display import Image
@interact
def show_images(file=os.listdir('images/')):
    display(Image(fdir+file))
Source: https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6
Now we can quickly cycle through all the images without re-running the cell. This might actually be useful if you were building a convolutional neural network and wanted to examine the images your network had missclassified.
The uses of widgets for data exploration are boundless. Another simple example is finding correlations between two columns:
Widget for correlation between two columns.
The Play widget is useful to perform animations by iterating on a sequence of integers at a certain speed. The value of the slider below is linked to the player.
play = widgets.Play(
    # interval=10,
    value=50,
    min=0,
    max=100,
    step=1,
    description="Press play",
    disabled=False
)
slider = widgets.IntSlider()
widgets.jslink((play, 'value'), (slider, 'value'))
widgets.HBox([play, slider])
The date picker widget works in Chrome and IE Edge but does not currently work in Firefox or Safari because they do not support the HTML date input field.
widgets.DatePicker(
    description='Pick a Date',
    disabled=False
)

4. Color picker
widgets.ColorPicker(
    concise=False,
    description='Pick a color',
    value='blue',
    disabled=False
)


5. Tabs
tab_contents = ['P0', 'P1', 'P2', 'P3', 'P4']
children = [widgets.Text(description=name) for name in tab_contents]
tab = widgets.Tab()
tab.children = children
for i in range(len(children)):
    tab.set_title(i, str(i))
tab


6. Widgets for Plots
Interactive widgets are especially helpful for selecting data to plot. We can use the same @interact decorator with functions that visualize our data:
import cufflinks as cf 
@interact
def scatter_plot(
x=list(df.select_dtypes('number').columns),                  y=list(df.select_dtypes('number').columns)[1:],                 theme=list(cf.themes.THEMES.keys()),                  colorscale=list(cf.colors._scales_names.keys())): 
     df.iplot(
kind='scatter', 
x=x, 
y=y, 
mode='markers', 
xTitle=x.title(), 
yTitle=y.title(), 
text='title', 
title=f'{y.title()} vs {x.title()}',            
theme=theme, 
colorscale=colorscale
)

Source: https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6

Here, cufflinks+plotly combination is used to make an interactive plot with interactive IPython widget controls.
If the plot was a little slow to update, we can use @interact_manual which requires a button for updating.


7. Qgrid
Qgrid is a Jupyter notebook widget but mainly focussed at dataframes. It uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting and filtering controls, as well as edit your DataFrames by double-clicking cells. The Github Repository contains more details and examples. Install:
#with pip
pip install qgrid
jupyter nbextension enable --py --sys-prefix qgrid
# only required if you have not enabled the ipywidgets nbextension yet
jupyter nbextension enable --py --sys-prefix widgetsnbextension
#with conda
# only required if you have not added conda-forge to your channels yet
conda config --add channels conda-forge
conda install qgrid

Qgrid

Embedding URL/ PDF/Youtube

Using IPython’s display module, you can easily embed an URLs, PDFs, and videos into your Jupyter Notebooks.
URLs
#Note that http urls will not be displayed. Only https are allowed inside the Iframe
from IPython.display import IFrame
IFrame('https://en.wikipedia.org/wiki/HTTPS', width=800, height=450)


PDFs
from IPython.display import IFrame
IFrame('https://arxiv.org/pdf/1406.2661.pdf', width=800, height=450)


Youtube Videos
from IPython.display import YouTubeVideo
YouTubeVideo('9SA7FaKxZVI', width=800, height=300)


Jupyter Themes

Just like your IDE’s schema or theme setting, Jupyter also have themes which can get the same thematic feeling. These themes also enhance the presentation of the code. For installation:
pip install jupyterthemes
List out the available themes with:
jt -l
Though not high in numbers, these are still pretty good and more are gonna come in future. Choose your favourite with:
jt -t <name of the theme>
Reset to original with:
jt -r
Chesterish and solarizedl

Wrapping Things Up

I urge you to read it as this one is adding more content to the last one. Install these Jupyter Notebook add-ons, spend some time figuring out which ones are useful to you, and improve your productivity. While none of these is life-changing, they all add just enough benefit to be worthwhile, cumulatively saving you hours of valuable development time.