I have a python code which uses Hugging Face Transformers to run an NLP task on a PDF document. When I run this code in Jupyter Notebook, it takes more than 1.5 hours to complete. I then setup the same code to run via a locally hosted Streamlit web app. To my surprise, it ran in under 5 mins!
I believe I am comparing apples to apples because:
- I am analyzing the same PDF document in each case
- Since the Streamlit app is locally hosted, all computation is running on my laptop CPU. I am not using any Hugging Face virtual resources. The HF models are being downloaded to my computer.
- The Jupyter Notebook is also running locally on my computer
- The
.py
file is generated from the Jupyter Notebook using 'streamlit-juypter' which just takes the Python code in the notebook and adds a few Streamlit statements
So, essentially same code running on same data using same hardware.
The only differences I can think of which may explain this are:
- Streamlit is running a
.py
python file from the command line instead of a.ipynb
notebook - Streamlit is running inside a virtual environment instead of my main Python installation
Has anyone ever experienced something like this? Can running the same python code from the command line result in 20x greater speed?