# Accelerate FastAI inference with the open-source library [nebullvm](https://github.com/nebuly-ai/nebullvm)

In this workbook we test `nebullvm` on the FastAI. 

`Nebullvm` is an open-source library that accelerates AI inference in just a few lines of code. Nebullvm tests different deep learning compilers to identify the best possible way to run your model on your specific hardware, without impacting the accuracy of your model.

## Nebullvm installation

If you have already installed `nebullvm`, skip to the next block. 

If not, you should first install the library on your hardware. It may take minutes to install all the tech behind the library, i.e. all the deep learning compilers.

As also explained in the GitHub readme, the recommended way to install `nebullvm` is by using pip, running

In [None]:
! pip install nebullvm

Now let's import `nebullvm` and wait for it to be fully installed. Just ignore any import warnings/errors coming from the installation.

In [None]:
import nebullvm

Next, **restart the notebook's kernel** and you'll be ready to go!

# Fine tune a FastAI model

This section is built primarily on the FastAI notebooks for beginners. The purpose of this workbook is not to be an in-depth guide to FastAI libraries, but, precisely, to show how to properly use `nebullvm` to speed up FastAI algorithms at inference time.

In [None]:
from fastai.vision.all import *

In [None]:
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
dls.show_batch()

The model will simply classify whether the image contains a cat (label `True`) or a dog (label `False`). Since our purpose in this workbook is only to show how to speed up the model, we are not really interested in the meaningfulness or usefulness of the task itself.

In [None]:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

Now that we have fine tuned the model, let's calculate how long it takes to run a single prediction.

In [None]:
import time

In [None]:
times = []
for _ in range(100):
    st = time.time()
    preds = learn.predict(files[0])
    times.append((time.time()-st)*1000)
fastai_vanilla_time = sum(times)/len(times)
print(f"Prediction time: {fastai_vanilla_time} ms,\nPrediction: {preds}")

In [None]:
#learn.save(".")

# Optimize the model with nebullvm

In [None]:
from nebullvm import optimize_torch_model

Let's start with model optimization. Using nebullvm is super easy: you just need to  specify the model, the batch size and the input size (for each input, excluding the batch size) and a directory
where you want to save the optimized model. 

In the example we chose the same directory where the model is stored.

As you can see we also added an additional parameter `use_torch_api` which is simply a boolean flag to enable more optimization capabilities of nebullvm.

In [None]:
optimized_model = optimize_torch_model(
    model=learn.model,
    batch_size=1,
    input_sizes=[(3, 224, 224)],
    save_dir=".",
    use_torch_api=True
)

In [None]:
class ModelWrapper(torch.nn.Module):
    def __init__(self, core):
        super().__init__()
        self.core = optimized_model
    
    def forward(self, *args, **kwargs):
        return self.core(*args, **kwargs)
    
    def parameters(self, *args, **kwargs):
        yield torch.zeros(100)

In [None]:
core_model = ModelWrapper(optimized_model)

In [None]:
learn.model = core_model

In [None]:
times = []
for _ in range(100):
    st = time.time()
    preds = learn.predict(files[0])
    times.append((time.time()-st)*1000)
optimized_time = sum(times) / len(times)
print(f"Prediction time: {optimized_time} ms,\nPrediction: {preds}")

## Summary

In [None]:
# Enter a username here. Below you will get the performance of your model
your_username = "anonymous"

In [None]:
# Uncomment the following line to install gputil (if you are using an NVIDIA GPU)
#!pip install gputil

In [None]:
import cpuinfo
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
cpu_info = cpuinfo.get_cpu_info()['brand_raw']
gpu_info = "no"
if torch.cuda.is_available():
    import GPUtil
    gpus = GPUtil.getGPUs()
    gpu_info = list(gpus)[0].name

In [None]:
message = f"""
Hello, I'm {your_username}!
I've tested nebullvm on the following setup:
Hardware: {cpu_info} CPU and {gpu_info} GPU.
Model: {learn.arch.__name__} - FastAI for image classification
Vanilla performance: {round(fastai_vanilla_time, 2)}ms
Optimized performance: {round(optimized_time, 2)}ms
Acceleration: {round(fastai_vanilla_time/optimized_time, 1)}x
"""
print(message)

Amazing :) Share your acceleration with the community in the comments of the main thread [here](https://medium.com/p/7ac3596fd718) and compare your results with others!