detecto.core

class detecto.core.DataLoader(dataset, **kwargs)
__init__(dataset, **kwargs)

Accepts a detecto.core.Dataset object and creates an iterable over the data, which can then be fed into a detecto.core.Model for training and validation. Extends PyTorch’s DataLoader class with a custom collate_fn function.

Parameters:
  • dataset (detecto.core.Dataset) – The dataset for iteration over.
  • kwargs (Any) – (Optional) Additional arguments to customize the DataLoader, such as batch_size or shuffle. See docs for more details.

Example:

>>> from detecto.core import Dataset, DataLoader

>>> dataset = Dataset('labels.csv', 'images/')
>>> loader = DataLoader(dataset, batch_size=2, shuffle=True)
>>> for images, targets in loader:
>>>     print(images[0].shape)
>>>     print(targets[0])
torch.Size([3, 1080, 1720])
{'boxes': tensor([[884, 387, 937, 784]]), 'labels': ['person']}
torch.Size([3, 1080, 1720])
{'boxes': tensor([[   1,  410, 1657, 1079]]), 'labels': ['car']}
...
class detecto.core.Dataset(label_data, image_folder=None, transform=None)
__init__(label_data, image_folder=None, transform=None)

Takes in the path to the label data and images and creates an indexable dataset over all of the data. Applies optional transforms over the data. Extends PyTorch’s Dataset.

Parameters:
  • label_data (str) – Can either contain the path to a folder storing the XML label files or a CSV file containing the label data. If a CSV file, the file should have the following columns in order: filename, width, height, class, xmin, ymin, xmax, ymax and image_id. See detecto.utils.xml_to_csv() to generate CSV files in this format from XML label files.
  • image_folder (str) – (Optional) The path to the folder containing the images. If not specified, it is assumed that the images and XML files are in the same directory as given by label_data. Defaults to None.
  • transform (torchvision.transforms.Compose or None) – (Optional) A torchvision transforms.Compose object containing transformations to apply on all elements in the dataset. See PyTorch docs for a list of possible transforms. When using transforms.Resize and transforms.RandomHorizontalFlip, all box coordinates are automatically adjusted to match the modified image. If None, defaults to the transforms returned by detecto.utils.default_transforms().

Indexing:

A Dataset object can be indexed like any other Python iterable. Doing so returns a tuple of length 2. The first element is the image and the second element is a dict containing a ‘boxes’ and ‘labels’ key. dict['boxes'] is a torch.Tensor of size (N, 4) containing xmin, ymin, xmax, and ymax of N boxes, where N is the number of labeled objects in the image. dict['labels'] is a list of size N containing the string labels for each of the objects in the image being indexed.

Example:

>>> from detecto.core import Dataset

>>> # Create dataset from separate XML and image folders
>>> dataset = Dataset('xml_labels/', 'images/')
>>> # Create dataset from a combined XML and image folder
>>> dataset1 = Dataset('images_and_labels/')
>>> # Create dataset from a CSV file and image folder
>>> dataset2 = Dataset('labels.csv', 'images/')

>>> print(len(dataset))
>>> image, target = dataset[0]
>>> print(image.shape)
>>> print(target)
4
torch.Size([3, 720, 1280])
{'boxes': tensor([[564, 43, 736, 349]]), 'labels': ['balloon']}
class detecto.core.Model(classes=None, device=None, pretrained=True, model_name='fasterrcnn_resnet50_fpn')
__init__(classes=None, device=None, pretrained=True, model_name='fasterrcnn_resnet50_fpn')

Initializes a machine learning model for object detection. Models are built on top of PyTorch’s pre-trained models, specifically the Faster R-CNN architectures, but allow for fine-tuning to predict on custom classes/labels.

Parameters:
  • classes (list or None) – (Optional) A list of classes/labels for the model to predict. If none given, uses the default classes specified here. Defaults to None.
  • device (torch.device or None) –

    (Optional) The device on which to run the model, such as the CPU or GPU. See here for details on specifying the device. Defaults to the GPU if available and the CPU if not.

  • pretrained (bool) – (Optional) Whether to load pretrained weights or not. Defaults to True.
  • model_name (str) – (Optional) The name of the Faster R-CNN model to use. Valid choices are "fasterrcnn_resnet50_fpn" (Model.DEFAULT), "fasterrcnn_mobilenet_v3_large_fpn" (Model.MOBILENET), and "fasterrcnn_mobilenet_v3_large_320_fpn" (Model.MOBILENET_320). Defaults to "fasterrcnn_resnet50_fpn".

Example:

>>> from detecto.core import Model

>>> model = Model(['dog', 'cat', 'bunny'])
fit(dataset, val_dataset=None, epochs=10, learning_rate=0.005, momentum=0.9, weight_decay=0.0005, gamma=0.1, lr_step_size=3, verbose=True)

Train the model on the given dataset. If given a validation dataset, returns a list of loss scores at each epoch.

Parameters:
  • dataset (detecto.core.Dataset or detecto.core.DataLoader) – A Dataset or DataLoader containing the dataset to train on. If given a Dataset, this method automatically wraps it in a DataLoader with shuffle set to True.
  • val_dataset (detecto.core.Dataset or detecto.core.DataLoader) – (Optional) A Dataset or DataLoader containing the dataset to validate on. Defaults to None, in which case no validation occurs.
  • epochs (int) – (Optional) The number of runs over the data in dataset to train for. Defaults to 10.
  • learning_rate (float) – (Optional) How fast to update the model weights at each step of training. Defaults to 0.005.
  • momentum (float) – (Optional) The momentum used to reduce the fluctuations of gradients at each step. Defaults to 0.9.
  • weight_decay (float) – (Optional) The amount of L2 regularization to apply on model parameters. Defaults to 0.0005.
  • gamma (float) – (Optional) The decay factor that learning_rate is multiplied by every lr_step_size epochs. Defaults to 0.1.
  • lr_step_size (int) – (Optional) The number of epochs between each decay of learning_rate by gamma. Defaults to 3.
  • verbose (bool) – (Optional) Whether to print the current epoch, progress, and loss (if given a validation dataset) at each step, along with some additional warnings if using a CPU. Defaults to True.
Returns:

If val_dataset is not None and epochs is greater than 0, returns a list of the validation losses at each epoch. Otherwise, returns nothing.

Return type:

list or None

Example:

>>> from detecto.core import Model, Dataset, DataLoader

>>> dataset = Dataset('training_data/')
>>> val_dataset = Dataset('validation_data/')
>>> model = Model(['rose', 'tulip'])

>>> losses = model.fit(dataset, val_dataset, epochs=5)

>>> # Alternatively, provide a custom DataLoader over your dataset
>>> loader = DataLoader(dataset, batch_size=2, shuffle=True)
>>> losses = model.fit(loader, val_dataset, epochs=5)

>>> losses
[0.11191498369799327, 0.09899920264606253, 0.08454859235434461,
    0.06825731012780788, 0.06236840748117637]
get_internal_model()

Returns the internal torchvision model that this class contains to allow for more advanced fine-tuning and the full use of features presented in the PyTorch library.

Returns:The torchvision model.
Return type:torchvision.models.detection.faster_rcnn.FasterRCNN

Example:

>>> from detecto.core import Model

>>> model = Model.load('model_weights.pth', ['tick', 'gate'])
>>> torch_model = model.get_internal_model()
>>> type(torch_model)
<class 'torchvision.models.detection.faster_rcnn.FasterRCNN'>
static load(file, classes)

Loads a model from a .pth file containing the model weights.

Parameters:
  • file (str) – The path to the .pth file containing the saved model.
  • classes (list) – The list of classes/labels this model was trained to predict. Must be in the same order as initially passed to detecto.core.Model.__init__() for accurate results.
Returns:

The model loaded from the file.

Return type:

detecto.core.Model

Example:

>>> from detecto.core import Model

>>> model = Model.load('model_weights.pth', ['ant', 'bee'])
predict(images)

Takes in an image or list of images and returns predictions for object locations.

Parameters:images (list or numpy.ndarray or torch.Tensor) – An image or list of images to predict on. If the images have not already been transformed into torch.Tensor objects, the default transformations contained in detecto.utils.default_transforms() will be applied.
Returns:If given a single image, returns a tuple of size three. The first element is a list of string labels of size N, the number of detected objects. The second element is a torch.Tensor of size (N, 4), giving the xmin, ymin, xmax, and ymax coordinates of the boxes around each object. The third element is a torch.Tensor of size N containing the scores of each predicted object (ranges from 0.0 to 1.0). If given a list of images, returns a list of the tuples described above, each tuple corresponding to a single image.
Return type:tuple or list of tuple

Example:

>>> from detecto.core import Model
>>> from detecto.utils import read_image

>>> model = Model.load('model_weights.pth', ['horse', 'zebra'])
>>> image = read_image('image.jpg')
>>> labels, boxes, scores = model.predict(image)
>>> print(labels[0])
>>> print(boxes[0])
>>> print(scores[0])
horse
tensor([   0.0000,  428.0744, 1617.1860, 1076.3607])
tensor(0.9397)
predict_top(images)

Takes in an image or list of images and returns the top scoring predictions for each detected label in each image. Equivalent to running detecto.core.Model.predict() and then detecto.utils.filter_top_predictions() together.

Parameters:images (list or numpy.ndarray or torch.Tensor) – An image or list of images to predict on. If the images have not already been transformed into torch.Tensor objects, the default transformations contained in detecto.utils.default_transforms() will be applied.
Returns:If given a single image, returns a tuple of size three. The first element is a list of string labels of size K, the number of uniquely detected objects. The second element is a torch.Tensor of size (K, 4), giving the xmin, ymin, xmax, and ymax coordinates of the top-scoring boxes around each unique object. The third element is a torch.Tensor of size K containing the scores of each uniquely predicted object (ranges from 0.0 to 1.0). If given a list of images, returns a list of the tuples described above, each tuple corresponding to a single image.
Return type:tuple or list of tuple

Example:

>>> from detecto.core import Model
>>> from detecto.utils import read_image

>>> model = Model.load('model_weights.pth', ['label1', 'label2'])
>>> image = read_image('image.jpg')
>>> top_preds = model.predict_top(image)
>>> top_preds
(['label2', 'label1'], tensor([[   0.0000,  428.0744, 1617.1860, 1076.3607],
[ 875.3470,  412.1762,  949.5915,  793.3424]]), tensor([0.9397, 0.8686]))
save(file)

Saves the internal model weights to a file.

Parameters:file (str) – The name of the file. Should have a .pth file extension.

Example:

>>> from detecto.core import Model

>>> model = Model(['tree', 'bush', 'leaf'])
>>> model.save('model_weights.pth')