detecto.core¶
-
class
detecto.core.
DataLoader
(dataset, **kwargs)¶ -
__init__
(dataset, **kwargs)¶ Accepts a
detecto.core.Dataset
object and creates an iterable over the data, which can then be fed into adetecto.core.Model
for training and validation. Extends PyTorch’s DataLoader class with a customcollate_fn
function.Parameters: - dataset (detecto.core.Dataset) – The dataset for iteration over.
- kwargs (Any) – (Optional) Additional arguments to customize the
DataLoader, such as
batch_size
orshuffle
. See docs for more details.
Example:
>>> from detecto.core import Dataset, DataLoader >>> dataset = Dataset('labels.csv', 'images/') >>> loader = DataLoader(dataset, batch_size=2, shuffle=True) >>> for images, targets in loader: >>> print(images[0].shape) >>> print(targets[0]) torch.Size([3, 1080, 1720]) {'boxes': tensor([[884, 387, 937, 784]]), 'labels': ['person']} torch.Size([3, 1080, 1720]) {'boxes': tensor([[ 1, 410, 1657, 1079]]), 'labels': ['car']} ...
-
-
class
detecto.core.
Dataset
(label_data, image_folder=None, transform=None)¶ -
__init__
(label_data, image_folder=None, transform=None)¶ Takes in the path to the label data and images and creates an indexable dataset over all of the data. Applies optional transforms over the data. Extends PyTorch’s Dataset.
Parameters: - label_data (str) – Can either contain the path to a folder storing
the XML label files or a CSV file containing the label data.
If a CSV file, the file should have the following columns in
order:
filename
,width
,height
,class
,xmin
,ymin
,xmax
,ymax
andimage_id
. Seedetecto.utils.xml_to_csv()
to generate CSV files in this format from XML label files. - image_folder (str) – (Optional) The path to the folder containing the images. If not specified, it is assumed that the images and XML files are in the same directory as given by label_data. Defaults to None.
- transform (torchvision.transforms.Compose or None) – (Optional) A torchvision transforms.Compose
object containing transformations to apply on all elements in
the dataset. See PyTorch docs
for a list of possible transforms. When using transforms.Resize
and transforms.RandomHorizontalFlip, all box coordinates are
automatically adjusted to match the modified image. If None,
defaults to the transforms returned by
detecto.utils.default_transforms()
.
Indexing:
A Dataset object can be indexed like any other Python iterable. Doing so returns a tuple of length 2. The first element is the image and the second element is a dict containing a ‘boxes’ and ‘labels’ key.
dict['boxes']
is a torch.Tensor of size(N, 4)
containingxmin
,ymin
,xmax
, andymax
of N boxes, where N is the number of labeled objects in the image.dict['labels']
is a list of size N containing the string labels for each of the objects in the image being indexed.Example:
>>> from detecto.core import Dataset >>> # Create dataset from separate XML and image folders >>> dataset = Dataset('xml_labels/', 'images/') >>> # Create dataset from a combined XML and image folder >>> dataset1 = Dataset('images_and_labels/') >>> # Create dataset from a CSV file and image folder >>> dataset2 = Dataset('labels.csv', 'images/') >>> print(len(dataset)) >>> image, target = dataset[0] >>> print(image.shape) >>> print(target) 4 torch.Size([3, 720, 1280]) {'boxes': tensor([[564, 43, 736, 349]]), 'labels': ['balloon']}
- label_data (str) – Can either contain the path to a folder storing
the XML label files or a CSV file containing the label data.
If a CSV file, the file should have the following columns in
order:
-
-
class
detecto.core.
Model
(classes=None, device=None, pretrained=True, model_name='fasterrcnn_resnet50_fpn')¶ -
__init__
(classes=None, device=None, pretrained=True, model_name='fasterrcnn_resnet50_fpn')¶ Initializes a machine learning model for object detection. Models are built on top of PyTorch’s pre-trained models, specifically the Faster R-CNN architectures, but allow for fine-tuning to predict on custom classes/labels.
Parameters: - classes (list or None) – (Optional) A list of classes/labels for the model to predict. If none given, uses the default classes specified here. Defaults to None.
- device (torch.device or None) –
(Optional) The device on which to run the model, such as the CPU or GPU. See here for details on specifying the device. Defaults to the GPU if available and the CPU if not.
- pretrained (bool) – (Optional) Whether to load pretrained weights or not. Defaults to True.
- model_name (str) – (Optional) The name of the Faster R-CNN model to use.
Valid choices are
"fasterrcnn_resnet50_fpn"
(Model.DEFAULT
),"fasterrcnn_mobilenet_v3_large_fpn"
(Model.MOBILENET
), and"fasterrcnn_mobilenet_v3_large_320_fpn"
(Model.MOBILENET_320
). Defaults to"fasterrcnn_resnet50_fpn"
.
Example:
>>> from detecto.core import Model >>> model = Model(['dog', 'cat', 'bunny'])
-
fit
(dataset, val_dataset=None, epochs=10, learning_rate=0.005, momentum=0.9, weight_decay=0.0005, gamma=0.1, lr_step_size=3, verbose=True)¶ Train the model on the given dataset. If given a validation dataset, returns a list of loss scores at each epoch.
Parameters: - dataset (detecto.core.Dataset or detecto.core.DataLoader) – A Dataset or DataLoader containing the dataset to train on. If given a Dataset, this method automatically wraps it in a DataLoader with shuffle set to True.
- val_dataset (detecto.core.Dataset or detecto.core.DataLoader) – (Optional) A Dataset or DataLoader containing the dataset to validate on. Defaults to None, in which case no validation occurs.
- epochs (int) – (Optional) The number of runs over the data in
dataset
to train for. Defaults to 10. - learning_rate (float) – (Optional) How fast to update the model weights at each step of training. Defaults to 0.005.
- momentum (float) – (Optional) The momentum used to reduce the fluctuations of gradients at each step. Defaults to 0.9.
- weight_decay (float) – (Optional) The amount of L2 regularization to apply on model parameters. Defaults to 0.0005.
- gamma (float) – (Optional) The decay factor that
learning_rate
is multiplied by everylr_step_size
epochs. Defaults to 0.1. - lr_step_size (int) – (Optional) The number of epochs between each
decay of
learning_rate
bygamma
. Defaults to 3. - verbose (bool) – (Optional) Whether to print the current epoch, progress, and loss (if given a validation dataset) at each step, along with some additional warnings if using a CPU. Defaults to True.
Returns: If
val_dataset
is not None and epochs is greater than 0, returns a list of the validation losses at each epoch. Otherwise, returns nothing.Return type: list or None
Example:
>>> from detecto.core import Model, Dataset, DataLoader >>> dataset = Dataset('training_data/') >>> val_dataset = Dataset('validation_data/') >>> model = Model(['rose', 'tulip']) >>> losses = model.fit(dataset, val_dataset, epochs=5) >>> # Alternatively, provide a custom DataLoader over your dataset >>> loader = DataLoader(dataset, batch_size=2, shuffle=True) >>> losses = model.fit(loader, val_dataset, epochs=5) >>> losses [0.11191498369799327, 0.09899920264606253, 0.08454859235434461, 0.06825731012780788, 0.06236840748117637]
-
get_internal_model
()¶ Returns the internal torchvision model that this class contains to allow for more advanced fine-tuning and the full use of features presented in the PyTorch library.
Returns: The torchvision model. Return type: torchvision.models.detection.faster_rcnn.FasterRCNN Example:
>>> from detecto.core import Model >>> model = Model.load('model_weights.pth', ['tick', 'gate']) >>> torch_model = model.get_internal_model() >>> type(torch_model) <class 'torchvision.models.detection.faster_rcnn.FasterRCNN'>
-
static
load
(file, classes)¶ Loads a model from a .pth file containing the model weights.
Parameters: - file (str) – The path to the .pth file containing the saved model.
- classes (list) – The list of classes/labels this model was trained
to predict. Must be in the same order as initially passed to
detecto.core.Model.__init__()
for accurate results.
Returns: The model loaded from the file.
Return type: Example:
>>> from detecto.core import Model >>> model = Model.load('model_weights.pth', ['ant', 'bee'])
-
predict
(images)¶ Takes in an image or list of images and returns predictions for object locations.
Parameters: images (list or numpy.ndarray or torch.Tensor) – An image or list of images to predict on. If the images have not already been transformed into torch.Tensor objects, the default transformations contained in detecto.utils.default_transforms()
will be applied.Returns: If given a single image, returns a tuple of size three. The first element is a list of string labels of size N, the number of detected objects. The second element is a torch.Tensor of size (N, 4), giving the xmin
,ymin
,xmax
, andymax
coordinates of the boxes around each object. The third element is a torch.Tensor of size N containing the scores of each predicted object (ranges from 0.0 to 1.0). If given a list of images, returns a list of the tuples described above, each tuple corresponding to a single image.Return type: tuple or list of tuple Example:
>>> from detecto.core import Model >>> from detecto.utils import read_image >>> model = Model.load('model_weights.pth', ['horse', 'zebra']) >>> image = read_image('image.jpg') >>> labels, boxes, scores = model.predict(image) >>> print(labels[0]) >>> print(boxes[0]) >>> print(scores[0]) horse tensor([ 0.0000, 428.0744, 1617.1860, 1076.3607]) tensor(0.9397)
-
predict_top
(images)¶ Takes in an image or list of images and returns the top scoring predictions for each detected label in each image. Equivalent to running
detecto.core.Model.predict()
and thendetecto.utils.filter_top_predictions()
together.Parameters: images (list or numpy.ndarray or torch.Tensor) – An image or list of images to predict on. If the images have not already been transformed into torch.Tensor objects, the default transformations contained in detecto.utils.default_transforms()
will be applied.Returns: If given a single image, returns a tuple of size three. The first element is a list of string labels of size K, the number of uniquely detected objects. The second element is a torch.Tensor of size (K, 4), giving the xmin
,ymin
,xmax
, andymax
coordinates of the top-scoring boxes around each unique object. The third element is a torch.Tensor of size K containing the scores of each uniquely predicted object (ranges from 0.0 to 1.0). If given a list of images, returns a list of the tuples described above, each tuple corresponding to a single image.Return type: tuple or list of tuple Example:
>>> from detecto.core import Model >>> from detecto.utils import read_image >>> model = Model.load('model_weights.pth', ['label1', 'label2']) >>> image = read_image('image.jpg') >>> top_preds = model.predict_top(image) >>> top_preds (['label2', 'label1'], tensor([[ 0.0000, 428.0744, 1617.1860, 1076.3607], [ 875.3470, 412.1762, 949.5915, 793.3424]]), tensor([0.9397, 0.8686]))
-
save
(file)¶ Saves the internal model weights to a file.
Parameters: file (str) – The name of the file. Should have a .pth file extension. Example:
>>> from detecto.core import Model >>> model = Model(['tree', 'bush', 'leaf']) >>> model.save('model_weights.pth')
-