detecto.utils

detecto.utils.default_transforms()

Returns the default, bare-minimum transformations that should be applied to images passed to classes in the detecto.core module.

Returns:A torchvision transforms.Compose object containing a transforms.ToTensor object and the transforms.Normalize object returned by detecto.utils.normalize_transform().
Return type:torchvision.transforms.Compose

Example:

>>> from detecto.core import Dataset
>>> from detecto.utils import default_transforms

>>> # Note: if transform=None, the Dataset will automatically
>>> # apply these default transforms to images
>>> defaults = default_transforms()
>>> dataset = Dataset('labels.csv', 'images/', transform=defaults)
detecto.utils.filter_top_predictions(labels, boxes, scores)

Filters out the top scoring predictions of each class from the given data. Note: passing the predictions from detecto.core.Model.predict() to this function produces the same results as a direct call to detecto.core.Model.predict_top().

Parameters:
  • labels (list) – A list containing the string labels.
  • boxes (torch.Tensor) – A tensor of size [N, 4] containing the N box coordinates.
  • scores (torch.Tensor) – A tensor containing the score for each prediction.
Returns:

Returns a tuple of the given labels, boxes, and scores, except with only the top scoring prediction of each unique label kept in; all other predictions are filtered out.

Return type:

tuple

Example:

>>> from detecto.core import Model
>>> from detecto.utils import read_image, filter_top_predictions

>>> model = Model.load('model_weights.pth', ['label1', 'label2'])
>>> image = read_image('image.jpg')
>>> labels, boxes, scores = model.predict(image)
>>> top_preds = filter_top_predictions(labels, boxes, scores)
>>> top_preds
(['label2', 'label1'], tensor([[   0.0000,  428.0744, 1617.1860, 1076.3607],
[ 875.3470,  412.1762,  949.5915,  793.3424]]), tensor([0.9397, 0.8686]))
detecto.utils.normalize_transform()

Returns a torchvision transforms.Normalize object with default mean and standard deviation values as required by PyTorch’s pre-trained models.

Returns:A transforms.Normalize object with pre-computed values.
Return type:torchvision.transforms.Normalize

Example:

>>> from detecto.core import Dataset
>>> from detecto.utils import normalize_transform
>>> from torchvision import transforms

>>> # Note: if transform=None, the Dataset will automatically
>>> # apply these default transforms to images
>>> defaults = transforms.Compose([
>>>     transforms.ToTensor(),
>>>     normalize_transform(),
>>> ])
>>> dataset = Dataset('labels.csv', 'images/', transform=defaults)
detecto.utils.read_image(path)

Helper function that reads in an image as a NumPy array. Equivalent to using OpenCV’s cv2.imread function and converting from BGR to RGB format.

Parameters:path (str) – The path to the image.
Returns:Image in NumPy array format
Return type:ndarray

Example:

>>> import matplotlib.pyplot as plt
>>> from detecto.utils import read_image

>>> image = read_image('image.jpg')
>>> plt.imshow(image)
>>> plt.show()
detecto.utils.reverse_normalize(image)

Reverses the normalization applied on an image by the detecto.utils.reverse_normalize() transformation. The image must be a torch.Tensor object.

Parameters:image (torch.Tensor) – A normalized image.
Returns:The image with the normalization undone.
Return type:torch.Tensor

Example:

>>> import matplotlib.pyplot as plt
>>> from torchvision import transforms
>>> from detecto.utils import read_image, \
>>>     default_transforms, reverse_normalize

>>> image = read_image('image.jpg')
>>> defaults = default_transforms()
>>> image = defaults(image)

>>> image = reverse_normalize(image)
>>> image = transforms.ToPILImage()(image)
>>> plt.imshow(image)
>>> plt.show()
detecto.utils.split_video(video_file, output_folder, prefix='frame', step_size=1)

Splits a video into individual frames and saves the JPG images to the specified output folder.

Parameters:
  • video_file (str) – The path to the video file to split.
  • output_folder (str) – The directory in which to save the frames.
  • prefix (str) – (Optional) The prefix to each frame’s file name. For example, if prefix == ‘image’, each frame will be saved as image0.jpg, image1.jpg, etc. Defaults to ‘frame’.
  • step_size (int) – (Optional) How many frames to skip between each save. For example, if step_size == 3, it will save every third frame. Defaults to 1.

Example:

>>> from detecto.utils import split_video

>>> split_video('video.mp4', 'frames/', step_size=4)
detecto.utils.xml_to_csv(xml_folder, output_file=None)

Converts a folder of XML label files into a pandas DataFrame and/or CSV file, which can then be used to create a detecto.core.Dataset object. Each XML file should correspond to an image and contain the image name, image size, image_id and the names and bounding boxes of the objects in the image, if any. Extraneous data in the XML files will simply be ignored. See here for an example XML file. For an image labeling tool that produces XML files in this format, see LabelImg.

Parameters:
  • xml_folder (str) – The path to the folder containing the XML files.
  • output_file (str or None) – (Optional) If given, saves a CSV file containing the XML data in the file output_file. If None, does not save to any file. Defaults to None.
Returns:

A pandas DataFrame containing the XML data.

Return type:

pandas.DataFrame

Example:

>>> from detecto.utils import xml_to_csv

>>> # Saves data to a file called labels.csv
>>> xml_to_csv('xml_labels/', 'labels.csv')
>>> # Returns a pandas DataFrame of the data
>>> df = xml_to_csv('xml_labels/')