Building Web Applications From Python Scripts with Streamlit

Apr 26, 2021

Reverie’s machine learning engineers are constantly building and improving our computational models to develop novel therapeutics. We often create Python tools to perform valuable computations such as filtering molecules for a particular characteristic or searching our internal database for specific PDB files to download.

To make these scripts available to our chemists, a web interface offers a pleasant user experience to accelerate our drug discovery process. We achieve this using the magic of Streamlit to quickly deploy our scripts into interactive web applications.

What is Streamlit

Streamlit is an open-source Python library that turns your scripts into shareable, interactive web applications.

Instead of writing a web application from scratch, complete with frontend interaction and backend communication, you can simply add a couple of Streamlit functions and deploy an interactive web application in minutes.

How it works

As a Python library, Streamlit is installed by Pip and imported into your script:


import streamlit as st

You can use the st module anywhere you would like to have interactive web components. For example, st.slider produces a slider the user can interact with to change the variable of the Python script.

On the backend, the Streamlit library is automatically creating React components to turn each Streamlit function into a piece of user interaction.

This image is from https://streamlit-components-tutorial.netlify.app/introduction/streamlit-react-python/, which is an excellent resource for understanding how Streamlit renders each component.

Demo

Time for a demonstration! The following is an application we might build in the drug discovery space. This Streamlit application takes a dataset of molecules and allows a user to visualize them and apply filters.

Chemical filters are crucial in drug discovery and can help remove nondrug-like compounds from our datasets. Here we are filtering by molecular weight, a criterion in Lipinski’s rule of five. While many filters are much more complicated, using criteria such as substructures, molecular weight is an excellent example for our demonstration.

Below is the entire script that produces the web application. I will describe the core parts in detail:

Inside a Docker container with the Anaconda distribution of Python, I use Conda to install RDKit, an Open-Source Cheminformatics Software (see gist for links).

After initializing the RDKit Conda environment, run the following command to install Streamlit and mols2grid, a molecular mvisualization library.


pip install streamlit mols2grid

We will be using a dataset of FDA-approved drugs (credits to Eric Vallabh Minikel for curation). This dataset contains the SMILES, a text-based description of a compound’s chemical structure, for each drug.

CureFFI provides the dataset as a tab-separated text file. Using Pandas, we can download the file into a DataFrame. We apply preprocessing to remove any missing values:

@st.cache(allow_output_mutation=True)
def download_dataset():
    """Loads once then cached for subsequent runs"""
    df = pd.read_csv(
        "https://www.cureffi.org/wp-content/uploads/2013/10/drugs.txt", sep="\t"
    ).dropna()
    return df

Streamlit operates by re-running the script on every user interaction with any changed inputs. We want to avoid long processes like downloading files to be repeated during the session. To prevent this, Streamlit provides built-in caching using the st.cache decorator.

df = download_dataset().copy()

RDKit reads the SMILES into a Mol object to calculate the molecular weight.

from rdkit import Chem
from rdkit.Chem.Descriptors import ExactMolWt

def calc_mw(smiles_string):
    """Given a smiles string (ex. C1CCCCC1), calculate and return the molecular weight"""
    mol = Chem.MolFromSmiles(smiles_string)
    return ExactMolWt(mol)

We then create a new dataset column of molecular weights by applying the calc_mw function to the SMILES of each row.

df["mol_weight"] = df.apply(lambda x: calc_mw(x["smiles"]), axis=1)

Streamlit can create a slider that allows the user to choose what the weight cutoff should be.

weight_cutoff = st.slider(
    label="Show compounds that weigh below:",
    min_value=0,
    max_value=500,
    value=150,
    step=10,
)

The variable weight_cutoff is reading in the value set by the user. We use it to filter the dataset:

df_result = df[df["mol_weight"] < weight_cutoff]

Finally, we display the results on the browser. We can show the filtered DataFrame by using the write function from Streamlit:

st.write(df_result)

For a more interactive presentation, we can draw the compounds using the SMILES. With the mols2grid library, which uses RDKit, we illustrate the drugs with functionality such as pagination and search.

Mols2grid returns an IPython object intended for use inside a Jupyter Notebook or Google Colab environment. To enable compatibility within Streamlit, we can apply the _repr_html_ function to the IPython object. The function will return the IPython object in HTML that we can display in Streamlit using the components library:

raw_html = mols2grid.display(df_result, mapping={"smiles": "SMILES"})._repr_html_()
components.html(raw_html, width=900, height=900, scrolling=True)

Now run the script and interact with the Streamlit application!

streamlit run streamlit_filter.py

That’s it! By adding a couple of Streamlit functions to the Python script, you now have a functioning web application.

Streamlit does make some tradeoffs to achieve a seamless web development experience. For example, the amount of frontend customization is limited to the functions Streamlit provides. Also, there aren’t extensive design options to customize the look and feel. Regardless, this is a small price to pay as Streamlit provides enough functionality to cover some of our simple use cases.

Deployment and Testing

For security, we launch Streamlit in a Kubernetes pod accessible through our VPN. Anytime a developer pushes a change to our Streamlit git repository, we rebuild and pull a fresh container. The refresh can also update our Kubernetes pod with the newest version of Streamlit. Being up to date is crucial since Streamlit is constantly being improved with new beta and experimental features.

The routine iteration of our scripts and the Streamlit versions creates a need for testing on each pull request. The challenge is that Streamlit catches most errors to display them to the user. This catch prevents testing frameworks like Pytest from seeing errors.

We work around this by running docker containers with a Selenium image installed with Chrome web browser and ChromeDriver. This allows for automated interaction with our Streamlit instance. We can then run a test with SeleniumBase to assert there are no errors printed on the page on our Streamlit app. This solution is based on a similar Streamlit protocol.

Extending Streamlit with Components

We are just scratching the surface of what's possible with Streamlit. The recently introduced components functionality enables the rendering of Javascript within a Streamlit application. There are already a number of exciting components that are used in the cheminformatics space such as streamlit_3dmol to visualize 3-D molecules.

In addition, there is a great tutorial on how to extend the popular JSME molecule editor (by B. Bienfait and P. Ertl) into a Streamlit app with React.: https://iwatobipen.wordpress.com/2020/12/30/embed-molecular-editor-into-streamlit-app-streamlit-chemoinformatics-rdkit/. This custom component enables the drawing of compounds as an input to a Streamlit app:

Reverie is Hiring

If you enjoyed this demonstration you may also enjoy working with Reverie! We’re actively hiring engineers across our tech stack and chemistry team, including Full Stack Engineers, Machine Learning Engineers, and Senior Data Scientists, to work on exciting challenges critical to our approach to developing life-saving cancer drugs. You will work with a profoundly technical YC-backed team that is growing in size and scope. You can read more about us at www.reverielabs.com, and please reach out if you’re interested in learning more.