Computer Vision

Riemann provides comprehensive support for computer vision tasks through the riemann.vision module, including popular datasets, image transformations, and data loading utilities.

Overview

The riemann.vision module includes the following main components:

  • Datasets: Popular datasets like MNIST, CIFAR-10, Flowers102, OxfordIIITPet, LFWPeople, SVHN, ImageFolder, etc.

  • Transforms: Image preprocessing and data augmentation operations

  • Data Loading: Seamless integration with DataLoader, supporting batch loading and parallel processing

Quick Start

import riemann as rm
from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader

# Define data transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Iterate through data
for images, labels in train_loader:
    print(f"Image batch shape: {images.shape}")  # [64, 1, 28, 28]
    print(f"Label batch shape: {labels.shape}")  # [64]
    break

Datasets

Riemann provides various popular computer vision datasets. All datasets inherit from the Dataset class and can be used with DataLoader.

Dataset Overview

Supported Datasets

Dataset

Description

Size

Download Source

MNIST

Handwritten digit recognition (0-9), 28×28 grayscale images

60,000 train / 10,000 test

AWS S3 (ossci-datasets)

FashionMNIST

Fashion product images (10 categories), 28×28 grayscale

60,000 train / 10,000 test

Zalando Research

CIFAR-10

10-class object recognition, 32×32 color images

50,000 train / 10,000 test

University of Toronto

CIFAR-100

100-class object recognition with 20 superclasses, 32×32 color images

50,000 train / 10,000 test

University of Toronto

Flowers102

102 flower categories classification

1,020 train / 1,020 val / 6,149 test

Oxford VGG

OxfordIIITPet

37 pet breeds (cats and dogs) classification

~7,000 images (~200 per class)

Oxford VGG

LFWPeople

Face recognition dataset with multiple identities

13,233 images / 5,749 people

UMass Amherst

SVHN

Street View House Numbers, 32×32 color images

73,257 train / 26,032 test / 531,131 extra

Stanford University

ImageFolder

Generic folder-based dataset loader

User-defined

Local files

DatasetFolder

Generic folder dataset with custom loader

User-defined

Local files

MNIST Dataset

Handwritten digit recognition dataset containing 60,000 training images and 10,000 test images, with image size of 28×28 pixels.

Parameters:

  • root (str): Root directory for data storage

  • train (bool): True to load training set, False to load test set

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Usage Example:

from riemann.vision.datasets import MNIST
from riemann.utils.data import DataLoader

# Load training and test sets
train_dataset = MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

EasyMNIST (Preprocessed MNIST)

EasyMNIST is a preprocessed version of MNIST that applies normalization, standardization, and flattening during initialization. Labels can be converted to one-hot encoding. This saves preprocessing time during training as transformations are applied once at initialization rather than during each epoch.

Parameters:

  • root (str): Root directory for data storage

  • train (bool): True to load training set, False to load test set

  • onehot_label (bool): If True, convert labels to one-hot encoding (default: True)

  • download (bool, optional): If True, downloads the dataset from the internet

Usage Example:

from riemann.vision.datasets import EasyMNIST

# Load EasyMNIST with one-hot labels (default)
train_dataset = EasyMNIST(root='./data', train=True, onehot_label=True, download=True)

# Load with scalar labels
test_dataset = EasyMNIST(root='./data', train=False, onehot_label=False, download=True)

# Data is already preprocessed (normalized, flattened)
image, label = train_dataset[0]
print(f"Image shape: {image.shape}")  # [784] - flattened
print(f"Label shape: {label.shape}")  # [10] - one-hot if onehot_label=True

FashionMNIST Dataset

Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. It is designed to be a drop-in replacement for MNIST.

Classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot

Parameters:

  • root (str): Root directory for data storage

  • train (bool): True to load training set, False to load test set

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Usage Example:

from riemann.vision.datasets import FashionMNIST

# Load FashionMNIST dataset
train_dataset = FashionMNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

print(f"Classes: {train_dataset.classes}")

CIFAR-10 Dataset

Contains 60,000 32×32 color images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).

Parameters:

  • root (str): Root directory for data storage

  • train (bool): True to load training set (50,000 images), False to load test set (10,000 images)

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Usage Example:

from riemann.vision.datasets import CIFAR10

# Load CIFAR-10 dataset
train_dataset = CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

CIFAR-100 Dataset

Contains 60,000 32×32 color images in 100 classes. Each class has 600 images (500 for training, 100 for testing). CIFAR-100 has 100 fine-grained classes and 20 superclasses.

Parameters:

  • root (str): Root directory for data storage

  • train (bool): True to load training set (50,000 images), False to load test set (10,000 images)

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

  • coarse (bool, optional): If True, uses 20 superclass labels; otherwise uses 100 fine-grained class labels (default: False)

Usage Example:

from riemann.vision.datasets import CIFAR100

# Load CIFAR-100 with fine-grained labels (100 classes)
train_dataset = CIFAR100(root='./data', train=True, download=True, coarse=False)
test_dataset = CIFAR100(root='./data', train=False, download=True, coarse=False)

# Load CIFAR-100 with superclass labels (20 classes)
train_dataset_coarse = CIFAR100(root='./data', train=True, download=True, coarse=True)

Flowers102 Dataset

Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers were chosen to be flowers commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations.

Note: This class requires scipy to load target files from .mat format.

Parameters:

  • root (str): Root directory of the dataset

  • split (str, optional): The dataset split, supports "train" (default), "val", or "test"

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Target transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Dataset Statistics:

  • Train: 1,020 images

  • Validation: 1,020 images

  • Test: 6,149 images

  • Total: 8,189 images across 102 classes

Usage Example:

from riemann.vision.datasets import Flowers102

# Load Flowers102 dataset
train_dataset = Flowers102(root='./data', split='train', download=True, transform=transforms.ToTensor())
val_dataset = Flowers102(root='./data', split='val', download=True, transform=transforms.ToTensor())
test_dataset = Flowers102(root='./data', split='test', download=True, transform=transforms.ToTensor())

print(f"Train samples: {len(train_dataset)}")  # 1020
print(f"Validation samples: {len(val_dataset)}")  # 1020
print(f"Test samples: {len(test_dataset)}")  # 6149

OxfordIIITPet Dataset

The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of species (cat or dog), breed, and pixel-level trimap segmentation.

Parameters:

  • root (str): Root directory of the dataset

  • split (str, optional): The dataset split, supports "trainval" (default) or "test"

  • target_types (str or list, optional): Types of target to use. Can be "category" (default), "binary-category", or "segmentation". Can also be a list to output a tuple with all specified target types.

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Target transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Target Types:

  • category (int): Label for one of the 37 pet categories

  • binary-category (int): Binary label for cat (0) or dog (1)

  • segmentation (PIL Image): Segmentation trimap of the image

Usage Example:

from riemann.vision.datasets import OxfordIIITPet

# Load with category labels
dataset = OxfordIIITPet(root='./data', split='trainval', target_types='category', download=True)

# Load with binary classification (cat vs dog)
dataset_bin = OxfordIIITPet(root='./data', split='trainval', target_types='binary-category', download=True)

# Load with segmentation masks
dataset_seg = OxfordIIITPet(root='./data', split='trainval', target_types='segmentation', download=True)

# Load with multiple target types
dataset_multi = OxfordIIITPet(root='./data', split='trainval',
                               target_types=['category', 'segmentation'], download=True)

LFWPeople Dataset

LFW (Labeled Faces in the Wild) People dataset contains 13,233 face images collected from the web. The images are organized into 5,749 different identities. This dataset is designed for face recognition research.

Parameters:

  • root (str): Root directory of the dataset

  • split (str, optional): The dataset split, supports "10fold" (default), "train", or "test"

  • image_set (str, optional): The image alignment type, supports "original", "funneled" (default), or "deepfunneled"

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Target transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Image Sets:

  • original: Original images without alignment

  • funneled: Geometrically normalized face images (default)

  • deepfunneled: Deep funneled images with better alignment

Usage Example:

from riemann.vision.datasets import LFWPeople

# Load LFWPeople dataset with funneled images
train_dataset = LFWPeople(root='./data', split='train', image_set='funneled', download=True)
test_dataset = LFWPeople(root='./data', split='test', image_set='funneled', download=True)

print(f"Number of classes (people): {len(train_dataset.classes)}")
print(f"Train samples: {len(train_dataset)}")

SVHN Dataset

SVHN (Street View House Numbers) dataset contains 32×32 color images of house numbers collected from Google Street View. The dataset includes 10 digit classes (0-9).

Note: This class requires scipy to load data from .mat format.

Parameters:

  • root (str): Root directory of the dataset

  • split (str): The dataset split, supports "train", "test", or "extra"

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Target transformation function

  • download (bool, optional): If True, downloads the dataset from the internet

Dataset Statistics:

  • Train: 73,257 images

  • Test: 26,032 images

  • Extra: 531,131 additional images (less difficult samples)

Usage Example:

from riemann.vision.datasets import SVHN

# Load SVHN dataset
train_dataset = SVHN(root='./data', split='train', download=True, transform=transforms.ToTensor())
test_dataset = SVHN(root='./data', split='test', download=True, transform=transforms.ToTensor())

# Also available: extra split with additional training data
extra_dataset = SVHN(root='./data', split='extra', download=True, transform=transforms.ToTensor())

ImageFolder Dataset

Load image datasets from local folders, suitable for custom datasets. Folder structure should be organized by class:

root/
├── class_a/
│   ├── img1.jpg
│   └── img2.png
├── class_b/
│   ├── img1.jpg
│   └── img2.jpg
└── class_c/
    └── img1.jpg

Parameters:

  • root (str): Root directory path of the dataset

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • loader (callable, optional): Image loading function, defaults to PIL Image loader

  • is_valid_file (callable, optional): Function to validate if a file is valid

Usage Example:

from riemann.vision.datasets import ImageFolder

# Load custom dataset from folder
dataset = ImageFolder(
    root='./custom_dataset',
    transform=transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ])
)

print(f"Number of classes: {len(dataset.classes)}")  # ['class_a', 'class_b', 'class_c']
print(f"Class to index mapping: {dataset.class_to_idx}")  # {'class_a': 0, 'class_b': 1, 'class_c': 2}

DatasetFolder

Generic folder dataset class, similar to ImageFolder but allows custom image loaders.

Parameters:

  • root (str): Root directory path of the dataset

  • loader (callable): Image loading function

  • extensions (tuple, optional): Tuple of allowed file extensions

  • transform (callable, optional): Image transformation function

  • target_transform (callable, optional): Label transformation function

  • is_valid_file (callable, optional): Function to validate if a file is valid

  • allow_empty (bool): Whether to allow empty folders, default False

Usage Example:

from riemann.vision.datasets import DatasetFolder, default_loader

# Use custom loader
dataset = DatasetFolder(
    root='./custom_dataset',
    loader=default_loader,
    extensions=('.jpg', '.png'),
    transform=transforms.ToTensor()
)

default_loader

default_loader is the default image loading function used by ImageFolder and DatasetFolder. It automatically selects the appropriate loading method based on file extension:

  • Image formats supported by PIL (e.g., .jpg, .png, .bmp, etc.): Loaded using PIL.Image.open() and converted to RGB mode

  • Other formats: Attempts to load using PIL

Purpose:

default_loader is mainly used for the loader parameter of ImageFolder and DatasetFolder to specify the image loading method. When using these two dataset classes, if the loader parameter is not specified, default_loader will be used by default.

Usage Example:

from riemann.vision.datasets import DatasetFolder, default_loader

# Load image using default_loader
image = default_loader('path/to/image.jpg')

# Use in DatasetFolder
dataset = DatasetFolder(
    root='./custom_dataset',
    loader=default_loader,  # Specify to use default_loader
    extensions=('.jpg', '.png')
)

Transforms

riemann.vision.transforms provides rich image transformation operations for data preprocessing and data augmentation.

Transform Overview

Supported Transforms

Transform

Description

Category

Compose

Combine multiple transforms into one

Utility

PILToTensor

Convert PIL Image to tensor without scaling

Conversion

ToTensor

Convert PIL Image or numpy.ndarray to tensor (scales to [0, 1])

Conversion

ToPILImage

Convert tensor to PIL Image

Conversion

ConvertImageDtype

Convert image to specified data type

Conversion

Normalize

Normalize tensor with mean and std

Normalization

Resize

Resize image to specified size

Geometric

CenterCrop

Crop image from center

Geometric

RandomHorizontalFlip

Randomly flip image horizontally

Augmentation

RandomVerticalFlip

Randomly flip image vertically

Augmentation

RandomRotation

Randomly rotate image by angle

Augmentation

ColorJitter

Randomly change brightness, contrast, saturation, hue

Augmentation

Grayscale

Convert image to grayscale

Color

RandomGrayscale

Randomly convert image to grayscale

Augmentation

RandomCrop

Randomly crop image to specified size

Augmentation

RandomResizedCrop

Random crop and resize image

Augmentation

FiveCrop

Crop image into 5 regions (4 corners + center)

Geometric

TenCrop

Crop image into 10 regions (FiveCrop + flips)

Geometric

Pad

Pad image with specified value

Geometric

Lambda

Apply custom lambda function

Utility

GaussianBlur

Apply Gaussian blur to image

Filter

RandomAffine

Random affine transformation

Augmentation

RandomPerspective

Random perspective transformation

Augmentation

RandomErasing

Randomly erase rectangular regions

Augmentation

AutoAugment

AutoAugment data augmentation policy

Auto Augmentation

RandAugment

RandAugment data augmentation policy

Auto Augmentation

TrivialAugmentWide

TrivialAugmentWide data augmentation policy

Auto Augmentation

SanitizeBoundingBox

Sanitize and validate bounding boxes

Detection

Invert

Invert image colors

Color

Posterize

Reduce number of bits for each color channel

Color

Solarize

Invert pixels above threshold

Color

Equalize

Equalize image histogram

Color

AutoContrast

Maximize image contrast

Color

Sharpness

Adjust image sharpness

Color

Brightness

Adjust image brightness

Color

Contrast

Adjust image contrast

Color

Saturation

Adjust image saturation

Color

Hue

Adjust image hue

Color

Compose

Combine multiple transformations and apply them in sequence.

Parameters:

  • transforms (list): List of transform objects to compose

Usage Example:

from riemann.vision import transforms

# Define transformation pipeline
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                       std=[0.229, 0.224, 0.225])
])

PILToTensor

Convert PIL Image to tensor without scaling. Unlike ToTensor, PILToTensor does not scale values from [0, 255] to [0.0, 1.0].

Usage Example:

from riemann.vision import transforms

# Convert PIL Image to tensor (values in [0, 255])
pil_to_tensor = transforms.PILToTensor()
tensor_img = pil_to_tensor(pil_image)

ToTensor

Convert PIL Image or numpy.ndarray to tensor. Scales values from [0, 255] to [0.0, 1.0].

Usage Example:

from riemann.vision import transforms

# Convert PIL Image to tensor (values in [0, 1])
to_tensor = transforms.ToTensor()
tensor_img = to_tensor(pil_image)

ToPILImage

Convert tensor to PIL Image.

Parameters:

  • mode (str, optional): Color mode of the output image

Usage Example:

from riemann.vision import transforms

# Convert tensor to PIL Image
to_pil = transforms.ToPILImage()
pil_img = to_pil(tensor)

ConvertImageDtype

Convert image to specified data type.

Parameters:

  • dtype (dtype): Target data type

Usage Example:

from riemann.vision import transforms

# Convert to float32
convert_dtype = transforms.ConvertImageDtype(dtype='float32')
converted_img = convert_dtype(img)

Normalize

Normalize tensor with mean and standard deviation.

Parameters:

  • mean (sequence): Mean values for each channel

  • std (sequence): Standard deviation values for each channel

Usage Example:

from riemann.vision import transforms

# Normalize using ImageNet statistics
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)
normalized_img = normalize(tensor_img)

Resize

Resize image to specified size.

Parameters:

  • size (int or tuple): Target size. If int, smaller edge is resized to size. If tuple, (height, width).

Usage Example:

from riemann.vision import transforms

# Resize to specific size
resize = transforms.Resize((224, 224))
resized_img = resize(pil_image)

# Resize by shorter side
resize = transforms.Resize(256)
resized_img = resize(pil_image)

CenterCrop

Crop image from center.

Parameters:

  • size (int or tuple): Crop size

Usage Example:

from riemann.vision import transforms

# Center crop to 224x224
center_crop = transforms.CenterCrop(224)
cropped_img = center_crop(pil_image)

RandomHorizontalFlip

Randomly flip image horizontally.

Parameters:

  • p (float): Probability of flipping (default: 0.5)

Usage Example:

from riemann.vision import transforms

# Flip with 50% probability
hflip = transforms.RandomHorizontalFlip(p=0.5)
flipped_img = hflip(pil_image)

RandomVerticalFlip

Randomly flip image vertically.

Parameters:

  • p (float): Probability of flipping (default: 0.5)

Usage Example:

from riemann.vision import transforms

# Flip with 50% probability
vflip = transforms.RandomVerticalFlip(p=0.5)
flipped_img = vflip(pil_image)

RandomRotation

Randomly rotate image by angle.

Parameters:

  • degrees (sequence or float): Range of degrees (-degrees, +degrees)

Usage Example:

from riemann.vision import transforms

# Rotate between -15 and 15 degrees
rotation = transforms.RandomRotation(degrees=15)
rotated_img = rotation(pil_image)

ColorJitter

Randomly change brightness, contrast, saturation, and hue.

Parameters:

  • brightness (float): Brightness jitter factor

  • contrast (float): Contrast jitter factor

  • saturation (float): Saturation jitter factor

  • hue (float): Hue jitter factor

Usage Example:

from riemann.vision import transforms

# Randomly adjust color
jitter = transforms.ColorJitter(
    brightness=0.2,
    contrast=0.2,
    saturation=0.2,
    hue=0.1
)
jittered_img = jitter(pil_image)

Grayscale

Convert image to grayscale.

Parameters:

  • num_output_channels (int): Number of output channels (1 or 3)

Usage Example:

from riemann.vision import transforms

# Convert to grayscale (1 channel)
gray = transforms.Grayscale(num_output_channels=1)
gray_img = gray(pil_image)

RandomGrayscale

Randomly convert image to grayscale.

Parameters:

  • p (float): Probability of conversion (default: 0.1)

Usage Example:

from riemann.vision import transforms

# Convert to grayscale with 10% probability
gray = transforms.RandomGrayscale(p=0.1)
gray_img = gray(pil_image)

RandomCrop

Randomly crop image to specified size.

Parameters:

  • size (int or tuple): Crop size

  • padding (int, optional): Padding size

Usage Example:

from riemann.vision import transforms

# Random crop with padding
crop = transforms.RandomCrop(224, padding=4)
cropped_img = crop(pil_image)

RandomResizedCrop

Random crop and resize image.

Parameters:

  • size (int or tuple): Target size

  • scale (tuple): Scale range for cropping

  • ratio (tuple): Aspect ratio range

Usage Example:

from riemann.vision import transforms

# Random resized crop
crop = transforms.RandomResizedCrop(224, scale=(0.08, 1.0))
cropped_img = crop(pil_image)

FiveCrop

Crop image into 5 regions (4 corners + center).

Parameters:

  • size (int or tuple): Crop size

Usage Example:

import riemann as rm
from riemann.vision import transforms

# Five crop
five_crop = transforms.FiveCrop(224)
crops = five_crop(pil_image)  # Returns tuple of 5 images

# Stack into batch
tensor_crops = rm.stack([transforms.ToTensor()(crop) for crop in crops])

TenCrop

Crop image into 10 regions (FiveCrop + horizontal flips).

Parameters:

  • size (int or tuple): Crop size

  • vertical_flip (bool): Also apply vertical flip

Usage Example:

import riemann as rm
from riemann.vision import transforms

# Ten crop
ten_crop = transforms.TenCrop(224)
crops = ten_crop(pil_image)  # Returns tuple of 10 images

Pad

Pad image with specified value.

Parameters:

  • padding (int or tuple): Padding size

  • fill (int or tuple): Fill value

Usage Example:

from riemann.vision import transforms

# Pad image
pad = transforms.Pad(padding=4, fill=0)
padded_img = pad(pil_image)

Lambda

Apply custom lambda function.

Parameters:

  • lambd (function): Lambda function to apply

Usage Example:

from riemann.vision import transforms

# Custom lambda transform
lambd = transforms.Lambda(lambda x: x.rotate(45))
transformed_img = lambd(pil_image)

GaussianBlur

Apply Gaussian blur to image.

Parameters:

  • kernel_size (int): Gaussian kernel size

  • sigma (float or tuple): Standard deviation

Usage Example:

from riemann.vision import transforms

# Apply Gaussian blur
blur = transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2.0))
blurred_img = blur(pil_image)

RandomAffine

Random affine transformation.

Parameters:

  • degrees (float or tuple): Rotation degrees

  • translate (tuple): Translation range

  • scale (tuple): Scale range

  • shear (float or tuple): Shear range

Usage Example:

from riemann.vision import transforms

# Random affine transformation
affine = transforms.RandomAffine(
    degrees=15,
    translate=(0.1, 0.1),
    scale=(0.9, 1.1)
)
transformed_img = affine(pil_image)

RandomPerspective

Random perspective transformation.

Parameters:

  • distortion_scale (float): Distortion scale

  • p (float): Probability of applying transform

Usage Example:

from riemann.vision import transforms

# Random perspective
perspective = transforms.RandomPerspective(distortion_scale=0.5, p=0.5)
transformed_img = perspective(pil_image)

RandomErasing

Randomly erase rectangular regions.

Parameters:

  • p (float): Probability of applying

  • scale (tuple): Erasing area range

  • ratio (tuple): Aspect ratio range

  • value (str or float): Erasing value

Usage Example:

from riemann.vision import transforms

# Random erasing (typically used on tensors)
erasing = transforms.RandomErasing(p=0.5, scale=(0.02, 0.33))
erased_tensor = erasing(tensor_img)

AutoAugment

AutoAugment data augmentation policy.

Parameters:

  • policy (str): Policy to use (‘imagenet’, ‘cifar10’, ‘svhn’)

Usage Example:

from riemann.vision import transforms

# AutoAugment with ImageNet policy
auto_augment = transforms.AutoAugment(policy='imagenet')
augmented_img = auto_augment(pil_image)

RandAugment

RandAugment data augmentation policy.

Parameters:

  • num_ops (int): Number of operations

  • magnitude (int): Magnitude of operations

Usage Example:

from riemann.vision import transforms

# RandAugment
rand_augment = transforms.RandAugment(num_ops=2, magnitude=9)
augmented_img = rand_augment(pil_image)

TrivialAugmentWide

TrivialAugmentWide data augmentation policy.

Usage Example:

from riemann.vision import transforms

# TrivialAugmentWide
trivial_augment = transforms.TrivialAugmentWide()
augmented_img = trivial_augment(pil_image)

SanitizeBoundingBox

Sanitize and validate bounding boxes.

Usage Example:

from riemann.vision import transforms

# Sanitize bounding boxes
sanitize = transforms.SanitizeBoundingBox()
sanitized_boxes = sanitize(boxes, image_size)

Invert

Invert image colors.

Usage Example:

from riemann.vision import transforms

# Invert image
invert = transforms.Invert()
inverted_img = invert(pil_image)

Posterize

Reduce number of bits for each color channel.

Parameters:

  • bits (int): Number of bits to keep

Usage Example:

from riemann.vision import transforms

# Posterize image
posterize = transforms.Posterize(bits=4)
posterized_img = posterize(pil_image)

Solarize

Invert pixels above threshold.

Parameters:

  • threshold (int): Threshold value

Usage Example:

from riemann.vision import transforms

# Solarize image
solarize = transforms.Solarize(threshold=128)
solarized_img = solarize(pil_image)

Equalize

Equalize image histogram.

Usage Example:

from riemann.vision import transforms

# Equalize image
equalize = transforms.Equalize()
equalized_img = equalize(pil_image)

AutoContrast

Maximize image contrast.

Usage Example:

from riemann.vision import transforms

# Auto contrast
auto_contrast = transforms.AutoContrast()
contrasted_img = auto_contrast(pil_image)

Sharpness

Adjust image sharpness.

Parameters:

  • sharpness_factor (float): Sharpness factor

Usage Example:

from riemann.vision import transforms

# Adjust sharpness
sharpness = transforms.Sharpness(sharpness_factor=2.0)
sharpened_img = sharpness(pil_image)

Brightness

Adjust image brightness.

Parameters:

  • brightness_factor (float): Brightness factor

Usage Example:

from riemann.vision import transforms

# Adjust brightness
brightness = transforms.Brightness(brightness_factor=1.5)
brightened_img = brightness(pil_image)

Contrast

Adjust image contrast.

Parameters:

  • contrast_factor (float): Contrast factor

Usage Example:

from riemann.vision import transforms

# Adjust contrast
contrast = transforms.Contrast(contrast_factor=1.5)
contrasted_img = contrast(pil_image)

Saturation

Adjust image saturation.

Parameters:

  • saturation_factor (float): Saturation factor

Usage Example:

from riemann.vision import transforms

# Adjust saturation
saturation = transforms.Saturation(saturation_factor=1.5)
saturated_img = saturation(pil_image)

Hue

Adjust image hue.

Parameters:

  • hue_factor (float): Hue factor (-0.5 to 0.5)

Usage Example:

from riemann.vision import transforms

# Adjust hue
hue = transforms.Hue(hue_factor=0.1)
hue_adjusted_img = hue(pil_image)

Complete Examples

The following examples demonstrate how to use Riemann’s computer vision module for common deep learning tasks.

Image Classification Training Pipeline

This example demonstrates the complete workflow of image classification using the CIFAR-10 dataset, including data loading, data augmentation, model definition, training, and evaluation.

Pipeline Overview:

  1. Data Preprocessing: Use random cropping, horizontal flipping, and color jittering for data augmentation

  2. Normalization: Normalize using ImageNet statistics

  3. Model Definition: Simple convolutional neural network

  4. Training Loop: Standard training flow including forward propagation, loss calculation, backward propagation, and parameter updates

import riemann as rm
import riemann.nn as nn
import riemann.optim as optim
from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader

# Define training data transforms (with data augmentation)
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),      # Random crop and resize
    transforms.RandomHorizontalFlip(),       # Random horizontal flip
    transforms.ColorJitter(                  # Color jitter (data augmentation)
        brightness=0.2,
        contrast=0.2
    ),
    transforms.ToTensor(),                   # Convert to tensor
    transforms.Normalize(                    # Normalize
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Define test data transforms (without data augmentation)
test_transform = transforms.Compose([
    transforms.Resize(256),                  # Resize
    transforms.CenterCrop(224),              # Center crop
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(
    root='./data',
    train=True,
    download=True,
    transform=train_transform
)
test_dataset = datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=test_transform
)

# Create data loaders
train_loader = DataLoader(
    train_dataset,
    batch_size=32,
    shuffle=True,           # Shuffle data during training
    num_workers=4           # Use 4 subprocesses to load data
)
test_loader = DataLoader(
    test_dataset,
    batch_size=32,
    shuffle=False
)

# Define convolutional neural network model
model = nn.Sequential(
    # First convolutional block
    nn.Conv2d(3, 64, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    # Second convolutional block
    nn.Conv2d(64, 128, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    # Fully connected layer
    nn.Flatten(),
    nn.Linear(128 * 8 * 8, 10)  # CIFAR-10 has 10 classes
)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(
    model.parameters(),
    lr=0.01,
    momentum=0.9
)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    running_loss = 0.0

    for batch_idx, (images, labels) in enumerate(train_loader):
        # Forward propagation
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward propagation and optimization
        optimizer.zero_grad()   # Clear gradients
        loss.backward()         # Compute gradients
        optimizer.step()        # Update parameters

        running_loss += loss.item()

        # Print progress every 100 batches
        if (batch_idx + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], '
                  f'Batch [{batch_idx+1}/{len(train_loader)}], '
                  f'Loss: {running_loss/100:.4f}')
            running_loss = 0.0

    print(f'Epoch {epoch+1} completed')

print('Training completed!')

Loading Custom Dataset with ImageFolder

When you have your own image dataset, you can use ImageFolder for convenient loading. Just organize images by folders, with each folder representing a class.

Required Folder Structure:

custom_dataset/
├── class_a/           # Images for class A
│   ├── img1.jpg
│   └── img2.png
├── class_b/           # Images for class B
│   ├── img1.jpg
│   └── img2.jpg
└── class_c/           # Images for class C
    └── img1.jpg

Loading Example:

from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader

# Define data transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# Load dataset using ImageFolder
dataset = datasets.ImageFolder(
    root='./custom_dataset',
    transform=transform
)

# View dataset information
print(f"Number of classes: {len(dataset.classes)}")
print(f"Class names: {dataset.classes}")
print(f"Class to index mapping: {dataset.class_to_idx}")
print(f"Total samples: {len(dataset)}")

# Create data loader
loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through data
for images, labels in loader:
    print(f"Image batch shape: {images.shape}")  # [32, 3, 224, 224]
    print(f"Label batch shape: {labels.shape}")  # [32]
    break

Creating Custom Dataset Class

When ImageFolder cannot meet your needs, you can inherit from the Dataset class to create a custom dataset. The following example shows how to create a custom dataset that loads images from folders.

Applicable Scenarios:

  • Need custom file organization

  • Need to load data from other sources (e.g., database, network)

  • Need complex preprocessing

from riemann.utils.data import Dataset
from PIL import Image
import os

class CustomImageDataset(Dataset):
    """
    Custom image dataset class

    Load images from folders with structure:
    root/
        label1/
            image1.jpg
            image2.jpg
        label2/
            image1.jpg
    """

    def __init__(self, root_dir, transform=None):
        """
        Parameters:
            root_dir (str): Root directory of dataset
            transform (callable, optional): Image transform function
        """
        self.root_dir = root_dir
        self.transform = transform
        self.images = []
        self.labels = []

        # Scan folders, collect all image paths and labels
        for label in sorted(os.listdir(root_dir)):
            label_dir = os.path.join(root_dir, label)
            if os.path.isdir(label_dir):
                for img_name in os.listdir(label_dir):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):
                        self.images.append(os.path.join(label_dir, img_name))
                        self.labels.append(int(label))

        print(f"Loaded {len(self.images)} images, {len(set(self.labels))} classes")

    def __len__(self):
        """Return dataset size"""
        return len(self.images)

    def __getitem__(self, idx):
        """
        Get sample at specified index

        Parameters:
            idx (int): Sample index

        Returns:
            tuple: (image, label)
        """
        # Load image
        img_path = self.images[idx]
        image = Image.open(img_path).convert('RGB')
        label = self.labels[idx]

        # Apply transforms
        if self.transform:
            image = self.transform(image)

        return image, label

# Use custom dataset
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

dataset = CustomImageDataset(
    root_dir='./custom_data',
    transform=transform
)

loader = DataLoader(
    dataset,
    batch_size=32,
    shuffle=True,
    num_workers=2
)

# Test data loading
for images, labels in loader:
    print(f"Batch image shape: {images.shape}")
    print(f"Batch labels: {labels}")
    break