Computer Vision
Riemann provides comprehensive support for computer vision tasks through the riemann.vision module, including popular datasets, image transformations, and data loading utilities.
Overview
The riemann.vision module includes the following main components:
Datasets: Popular datasets like MNIST, CIFAR-10, Flowers102, OxfordIIITPet, LFWPeople, SVHN, ImageFolder, etc.
Transforms: Image preprocessing and data augmentation operations
Data Loading: Seamless integration with
DataLoader, supporting batch loading and parallel processing
Quick Start
import riemann as rm
from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader
# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
# Iterate through data
for images, labels in train_loader:
print(f"Image batch shape: {images.shape}") # [64, 1, 28, 28]
print(f"Label batch shape: {labels.shape}") # [64]
break
Datasets
Riemann provides various popular computer vision datasets. All datasets inherit from the Dataset class and can be used with DataLoader.
Dataset Overview
Dataset |
Description |
Size |
Download Source |
|---|---|---|---|
MNIST |
Handwritten digit recognition (0-9), 28×28 grayscale images |
60,000 train / 10,000 test |
AWS S3 (ossci-datasets) |
FashionMNIST |
Fashion product images (10 categories), 28×28 grayscale |
60,000 train / 10,000 test |
Zalando Research |
CIFAR-10 |
10-class object recognition, 32×32 color images |
50,000 train / 10,000 test |
University of Toronto |
CIFAR-100 |
100-class object recognition with 20 superclasses, 32×32 color images |
50,000 train / 10,000 test |
University of Toronto |
Flowers102 |
102 flower categories classification |
1,020 train / 1,020 val / 6,149 test |
Oxford VGG |
OxfordIIITPet |
37 pet breeds (cats and dogs) classification |
~7,000 images (~200 per class) |
Oxford VGG |
LFWPeople |
Face recognition dataset with multiple identities |
13,233 images / 5,749 people |
UMass Amherst |
SVHN |
Street View House Numbers, 32×32 color images |
73,257 train / 26,032 test / 531,131 extra |
Stanford University |
ImageFolder |
Generic folder-based dataset loader |
User-defined |
Local files |
DatasetFolder |
Generic folder dataset with custom loader |
User-defined |
Local files |
MNIST Dataset
Handwritten digit recognition dataset containing 60,000 training images and 10,000 test images, with image size of 28×28 pixels.
Parameters:
root(str): Root directory for data storagetrain(bool):Trueto load training set,Falseto load test settransform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Usage Example:
from riemann.vision.datasets import MNIST
from riemann.utils.data import DataLoader
# Load training and test sets
train_dataset = MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
EasyMNIST (Preprocessed MNIST)
EasyMNIST is a preprocessed version of MNIST that applies normalization, standardization, and flattening during initialization. Labels can be converted to one-hot encoding. This saves preprocessing time during training as transformations are applied once at initialization rather than during each epoch.
Parameters:
root(str): Root directory for data storagetrain(bool):Trueto load training set,Falseto load test setonehot_label(bool): If True, convert labels to one-hot encoding (default: True)download(bool, optional): If True, downloads the dataset from the internet
Usage Example:
from riemann.vision.datasets import EasyMNIST
# Load EasyMNIST with one-hot labels (default)
train_dataset = EasyMNIST(root='./data', train=True, onehot_label=True, download=True)
# Load with scalar labels
test_dataset = EasyMNIST(root='./data', train=False, onehot_label=False, download=True)
# Data is already preprocessed (normalized, flattened)
image, label = train_dataset[0]
print(f"Image shape: {image.shape}") # [784] - flattened
print(f"Label shape: {label.shape}") # [10] - one-hot if onehot_label=True
FashionMNIST Dataset
Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. It is designed to be a drop-in replacement for MNIST.
Classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot
Parameters:
root(str): Root directory for data storagetrain(bool):Trueto load training set,Falseto load test settransform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Usage Example:
from riemann.vision.datasets import FashionMNIST
# Load FashionMNIST dataset
train_dataset = FashionMNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())
print(f"Classes: {train_dataset.classes}")
CIFAR-10 Dataset
Contains 60,000 32×32 color images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).
Parameters:
root(str): Root directory for data storagetrain(bool):Trueto load training set (50,000 images),Falseto load test set (10,000 images)transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Usage Example:
from riemann.vision.datasets import CIFAR10
# Load CIFAR-10 dataset
train_dataset = CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())
CIFAR-100 Dataset
Contains 60,000 32×32 color images in 100 classes. Each class has 600 images (500 for training, 100 for testing). CIFAR-100 has 100 fine-grained classes and 20 superclasses.
Parameters:
root(str): Root directory for data storagetrain(bool):Trueto load training set (50,000 images),Falseto load test set (10,000 images)transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functiondownload(bool, optional): If True, downloads the dataset from the internetcoarse(bool, optional): If True, uses 20 superclass labels; otherwise uses 100 fine-grained class labels (default: False)
Usage Example:
from riemann.vision.datasets import CIFAR100
# Load CIFAR-100 with fine-grained labels (100 classes)
train_dataset = CIFAR100(root='./data', train=True, download=True, coarse=False)
test_dataset = CIFAR100(root='./data', train=False, download=True, coarse=False)
# Load CIFAR-100 with superclass labels (20 classes)
train_dataset_coarse = CIFAR100(root='./data', train=True, download=True, coarse=True)
Flowers102 Dataset
Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers were chosen to be flowers commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations.
Note: This class requires scipy to load target files from .mat format.
Parameters:
root(str): Root directory of the datasetsplit(str, optional): The dataset split, supports"train"(default),"val", or"test"transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Target transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Dataset Statistics:
Train: 1,020 images
Validation: 1,020 images
Test: 6,149 images
Total: 8,189 images across 102 classes
Usage Example:
from riemann.vision.datasets import Flowers102
# Load Flowers102 dataset
train_dataset = Flowers102(root='./data', split='train', download=True, transform=transforms.ToTensor())
val_dataset = Flowers102(root='./data', split='val', download=True, transform=transforms.ToTensor())
test_dataset = Flowers102(root='./data', split='test', download=True, transform=transforms.ToTensor())
print(f"Train samples: {len(train_dataset)}") # 1020
print(f"Validation samples: {len(val_dataset)}") # 1020
print(f"Test samples: {len(test_dataset)}") # 6149
OxfordIIITPet Dataset
The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of species (cat or dog), breed, and pixel-level trimap segmentation.
Parameters:
root(str): Root directory of the datasetsplit(str, optional): The dataset split, supports"trainval"(default) or"test"target_types(str or list, optional): Types of target to use. Can be"category"(default),"binary-category", or"segmentation". Can also be a list to output a tuple with all specified target types.transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Target transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Target Types:
category(int): Label for one of the 37 pet categoriesbinary-category(int): Binary label for cat (0) or dog (1)segmentation(PIL Image): Segmentation trimap of the image
Usage Example:
from riemann.vision.datasets import OxfordIIITPet
# Load with category labels
dataset = OxfordIIITPet(root='./data', split='trainval', target_types='category', download=True)
# Load with binary classification (cat vs dog)
dataset_bin = OxfordIIITPet(root='./data', split='trainval', target_types='binary-category', download=True)
# Load with segmentation masks
dataset_seg = OxfordIIITPet(root='./data', split='trainval', target_types='segmentation', download=True)
# Load with multiple target types
dataset_multi = OxfordIIITPet(root='./data', split='trainval',
target_types=['category', 'segmentation'], download=True)
LFWPeople Dataset
LFW (Labeled Faces in the Wild) People dataset contains 13,233 face images collected from the web. The images are organized into 5,749 different identities. This dataset is designed for face recognition research.
Parameters:
root(str): Root directory of the datasetsplit(str, optional): The dataset split, supports"10fold"(default),"train", or"test"image_set(str, optional): The image alignment type, supports"original","funneled"(default), or"deepfunneled"transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Target transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Image Sets:
original: Original images without alignmentfunneled: Geometrically normalized face images (default)deepfunneled: Deep funneled images with better alignment
Usage Example:
from riemann.vision.datasets import LFWPeople
# Load LFWPeople dataset with funneled images
train_dataset = LFWPeople(root='./data', split='train', image_set='funneled', download=True)
test_dataset = LFWPeople(root='./data', split='test', image_set='funneled', download=True)
print(f"Number of classes (people): {len(train_dataset.classes)}")
print(f"Train samples: {len(train_dataset)}")
SVHN Dataset
SVHN (Street View House Numbers) dataset contains 32×32 color images of house numbers collected from Google Street View. The dataset includes 10 digit classes (0-9).
Note: This class requires scipy to load data from .mat format.
Parameters:
root(str): Root directory of the datasetsplit(str): The dataset split, supports"train","test", or"extra"transform(callable, optional): Image transformation functiontarget_transform(callable, optional): Target transformation functiondownload(bool, optional): If True, downloads the dataset from the internet
Dataset Statistics:
Train: 73,257 images
Test: 26,032 images
Extra: 531,131 additional images (less difficult samples)
Usage Example:
from riemann.vision.datasets import SVHN
# Load SVHN dataset
train_dataset = SVHN(root='./data', split='train', download=True, transform=transforms.ToTensor())
test_dataset = SVHN(root='./data', split='test', download=True, transform=transforms.ToTensor())
# Also available: extra split with additional training data
extra_dataset = SVHN(root='./data', split='extra', download=True, transform=transforms.ToTensor())
ImageFolder Dataset
Load image datasets from local folders, suitable for custom datasets. Folder structure should be organized by class:
root/
├── class_a/
│ ├── img1.jpg
│ └── img2.png
├── class_b/
│ ├── img1.jpg
│ └── img2.jpg
└── class_c/
└── img1.jpg
Parameters:
root(str): Root directory path of the datasettransform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functionloader(callable, optional): Image loading function, defaults to PIL Image loaderis_valid_file(callable, optional): Function to validate if a file is valid
Usage Example:
from riemann.vision.datasets import ImageFolder
# Load custom dataset from folder
dataset = ImageFolder(
root='./custom_dataset',
transform=transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
)
print(f"Number of classes: {len(dataset.classes)}") # ['class_a', 'class_b', 'class_c']
print(f"Class to index mapping: {dataset.class_to_idx}") # {'class_a': 0, 'class_b': 1, 'class_c': 2}
DatasetFolder
Generic folder dataset class, similar to ImageFolder but allows custom image loaders.
Parameters:
root(str): Root directory path of the datasetloader(callable): Image loading functionextensions(tuple, optional): Tuple of allowed file extensionstransform(callable, optional): Image transformation functiontarget_transform(callable, optional): Label transformation functionis_valid_file(callable, optional): Function to validate if a file is validallow_empty(bool): Whether to allow empty folders, defaultFalse
Usage Example:
from riemann.vision.datasets import DatasetFolder, default_loader
# Use custom loader
dataset = DatasetFolder(
root='./custom_dataset',
loader=default_loader,
extensions=('.jpg', '.png'),
transform=transforms.ToTensor()
)
default_loader
default_loader is the default image loading function used by ImageFolder and DatasetFolder. It automatically selects the appropriate loading method based on file extension:
Image formats supported by PIL (e.g., .jpg, .png, .bmp, etc.): Loaded using PIL.Image.open() and converted to RGB mode
Other formats: Attempts to load using PIL
Purpose:
default_loader is mainly used for the loader parameter of ImageFolder and DatasetFolder to specify the image loading method. When using these two dataset classes, if the loader parameter is not specified, default_loader will be used by default.
Usage Example:
from riemann.vision.datasets import DatasetFolder, default_loader
# Load image using default_loader
image = default_loader('path/to/image.jpg')
# Use in DatasetFolder
dataset = DatasetFolder(
root='./custom_dataset',
loader=default_loader, # Specify to use default_loader
extensions=('.jpg', '.png')
)
Transforms
riemann.vision.transforms provides rich image transformation operations for data preprocessing and data augmentation.
Transform Overview
Transform |
Description |
Category |
|---|---|---|
Compose |
Combine multiple transforms into one |
Utility |
PILToTensor |
Convert PIL Image to tensor without scaling |
Conversion |
ToTensor |
Convert PIL Image or numpy.ndarray to tensor (scales to [0, 1]) |
Conversion |
ToPILImage |
Convert tensor to PIL Image |
Conversion |
ConvertImageDtype |
Convert image to specified data type |
Conversion |
Normalize |
Normalize tensor with mean and std |
Normalization |
Resize |
Resize image to specified size |
Geometric |
CenterCrop |
Crop image from center |
Geometric |
RandomHorizontalFlip |
Randomly flip image horizontally |
Augmentation |
RandomVerticalFlip |
Randomly flip image vertically |
Augmentation |
RandomRotation |
Randomly rotate image by angle |
Augmentation |
ColorJitter |
Randomly change brightness, contrast, saturation, hue |
Augmentation |
Grayscale |
Convert image to grayscale |
Color |
RandomGrayscale |
Randomly convert image to grayscale |
Augmentation |
RandomCrop |
Randomly crop image to specified size |
Augmentation |
RandomResizedCrop |
Random crop and resize image |
Augmentation |
FiveCrop |
Crop image into 5 regions (4 corners + center) |
Geometric |
TenCrop |
Crop image into 10 regions (FiveCrop + flips) |
Geometric |
Pad |
Pad image with specified value |
Geometric |
Lambda |
Apply custom lambda function |
Utility |
GaussianBlur |
Apply Gaussian blur to image |
Filter |
RandomAffine |
Random affine transformation |
Augmentation |
RandomPerspective |
Random perspective transformation |
Augmentation |
RandomErasing |
Randomly erase rectangular regions |
Augmentation |
AutoAugment |
AutoAugment data augmentation policy |
Auto Augmentation |
RandAugment |
RandAugment data augmentation policy |
Auto Augmentation |
TrivialAugmentWide |
TrivialAugmentWide data augmentation policy |
Auto Augmentation |
SanitizeBoundingBox |
Sanitize and validate bounding boxes |
Detection |
Invert |
Invert image colors |
Color |
Posterize |
Reduce number of bits for each color channel |
Color |
Solarize |
Invert pixels above threshold |
Color |
Equalize |
Equalize image histogram |
Color |
AutoContrast |
Maximize image contrast |
Color |
Sharpness |
Adjust image sharpness |
Color |
Brightness |
Adjust image brightness |
Color |
Contrast |
Adjust image contrast |
Color |
Saturation |
Adjust image saturation |
Color |
Hue |
Adjust image hue |
Color |
Compose
Combine multiple transformations and apply them in sequence.
Parameters:
transforms(list): List of transform objects to compose
Usage Example:
from riemann.vision import transforms
# Define transformation pipeline
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
PILToTensor
Convert PIL Image to tensor without scaling. Unlike ToTensor, PILToTensor does not scale values from [0, 255] to [0.0, 1.0].
Usage Example:
from riemann.vision import transforms
# Convert PIL Image to tensor (values in [0, 255])
pil_to_tensor = transforms.PILToTensor()
tensor_img = pil_to_tensor(pil_image)
ToTensor
Convert PIL Image or numpy.ndarray to tensor. Scales values from [0, 255] to [0.0, 1.0].
Usage Example:
from riemann.vision import transforms
# Convert PIL Image to tensor (values in [0, 1])
to_tensor = transforms.ToTensor()
tensor_img = to_tensor(pil_image)
ToPILImage
Convert tensor to PIL Image.
Parameters:
mode(str, optional): Color mode of the output image
Usage Example:
from riemann.vision import transforms
# Convert tensor to PIL Image
to_pil = transforms.ToPILImage()
pil_img = to_pil(tensor)
ConvertImageDtype
Convert image to specified data type.
Parameters:
dtype(dtype): Target data type
Usage Example:
from riemann.vision import transforms
# Convert to float32
convert_dtype = transforms.ConvertImageDtype(dtype='float32')
converted_img = convert_dtype(img)
Normalize
Normalize tensor with mean and standard deviation.
Parameters:
mean(sequence): Mean values for each channelstd(sequence): Standard deviation values for each channel
Usage Example:
from riemann.vision import transforms
# Normalize using ImageNet statistics
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
normalized_img = normalize(tensor_img)
Resize
Resize image to specified size.
Parameters:
size(int or tuple): Target size. If int, smaller edge is resized to size. If tuple, (height, width).
Usage Example:
from riemann.vision import transforms
# Resize to specific size
resize = transforms.Resize((224, 224))
resized_img = resize(pil_image)
# Resize by shorter side
resize = transforms.Resize(256)
resized_img = resize(pil_image)
CenterCrop
Crop image from center.
Parameters:
size(int or tuple): Crop size
Usage Example:
from riemann.vision import transforms
# Center crop to 224x224
center_crop = transforms.CenterCrop(224)
cropped_img = center_crop(pil_image)
RandomHorizontalFlip
Randomly flip image horizontally.
Parameters:
p(float): Probability of flipping (default: 0.5)
Usage Example:
from riemann.vision import transforms
# Flip with 50% probability
hflip = transforms.RandomHorizontalFlip(p=0.5)
flipped_img = hflip(pil_image)
RandomVerticalFlip
Randomly flip image vertically.
Parameters:
p(float): Probability of flipping (default: 0.5)
Usage Example:
from riemann.vision import transforms
# Flip with 50% probability
vflip = transforms.RandomVerticalFlip(p=0.5)
flipped_img = vflip(pil_image)
RandomRotation
Randomly rotate image by angle.
Parameters:
degrees(sequence or float): Range of degrees (-degrees, +degrees)
Usage Example:
from riemann.vision import transforms
# Rotate between -15 and 15 degrees
rotation = transforms.RandomRotation(degrees=15)
rotated_img = rotation(pil_image)
ColorJitter
Randomly change brightness, contrast, saturation, and hue.
Parameters:
brightness(float): Brightness jitter factorcontrast(float): Contrast jitter factorsaturation(float): Saturation jitter factorhue(float): Hue jitter factor
Usage Example:
from riemann.vision import transforms
# Randomly adjust color
jitter = transforms.ColorJitter(
brightness=0.2,
contrast=0.2,
saturation=0.2,
hue=0.1
)
jittered_img = jitter(pil_image)
Grayscale
Convert image to grayscale.
Parameters:
num_output_channels(int): Number of output channels (1 or 3)
Usage Example:
from riemann.vision import transforms
# Convert to grayscale (1 channel)
gray = transforms.Grayscale(num_output_channels=1)
gray_img = gray(pil_image)
RandomGrayscale
Randomly convert image to grayscale.
Parameters:
p(float): Probability of conversion (default: 0.1)
Usage Example:
from riemann.vision import transforms
# Convert to grayscale with 10% probability
gray = transforms.RandomGrayscale(p=0.1)
gray_img = gray(pil_image)
RandomCrop
Randomly crop image to specified size.
Parameters:
size(int or tuple): Crop sizepadding(int, optional): Padding size
Usage Example:
from riemann.vision import transforms
# Random crop with padding
crop = transforms.RandomCrop(224, padding=4)
cropped_img = crop(pil_image)
RandomResizedCrop
Random crop and resize image.
Parameters:
size(int or tuple): Target sizescale(tuple): Scale range for croppingratio(tuple): Aspect ratio range
Usage Example:
from riemann.vision import transforms
# Random resized crop
crop = transforms.RandomResizedCrop(224, scale=(0.08, 1.0))
cropped_img = crop(pil_image)
FiveCrop
Crop image into 5 regions (4 corners + center).
Parameters:
size(int or tuple): Crop size
Usage Example:
import riemann as rm
from riemann.vision import transforms
# Five crop
five_crop = transforms.FiveCrop(224)
crops = five_crop(pil_image) # Returns tuple of 5 images
# Stack into batch
tensor_crops = rm.stack([transforms.ToTensor()(crop) for crop in crops])
TenCrop
Crop image into 10 regions (FiveCrop + horizontal flips).
Parameters:
size(int or tuple): Crop sizevertical_flip(bool): Also apply vertical flip
Usage Example:
import riemann as rm
from riemann.vision import transforms
# Ten crop
ten_crop = transforms.TenCrop(224)
crops = ten_crop(pil_image) # Returns tuple of 10 images
Pad
Pad image with specified value.
Parameters:
padding(int or tuple): Padding sizefill(int or tuple): Fill value
Usage Example:
from riemann.vision import transforms
# Pad image
pad = transforms.Pad(padding=4, fill=0)
padded_img = pad(pil_image)
Lambda
Apply custom lambda function.
Parameters:
lambd(function): Lambda function to apply
Usage Example:
from riemann.vision import transforms
# Custom lambda transform
lambd = transforms.Lambda(lambda x: x.rotate(45))
transformed_img = lambd(pil_image)
GaussianBlur
Apply Gaussian blur to image.
Parameters:
kernel_size(int): Gaussian kernel sizesigma(float or tuple): Standard deviation
Usage Example:
from riemann.vision import transforms
# Apply Gaussian blur
blur = transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2.0))
blurred_img = blur(pil_image)
RandomAffine
Random affine transformation.
Parameters:
degrees(float or tuple): Rotation degreestranslate(tuple): Translation rangescale(tuple): Scale rangeshear(float or tuple): Shear range
Usage Example:
from riemann.vision import transforms
# Random affine transformation
affine = transforms.RandomAffine(
degrees=15,
translate=(0.1, 0.1),
scale=(0.9, 1.1)
)
transformed_img = affine(pil_image)
RandomPerspective
Random perspective transformation.
Parameters:
distortion_scale(float): Distortion scalep(float): Probability of applying transform
Usage Example:
from riemann.vision import transforms
# Random perspective
perspective = transforms.RandomPerspective(distortion_scale=0.5, p=0.5)
transformed_img = perspective(pil_image)
RandomErasing
Randomly erase rectangular regions.
Parameters:
p(float): Probability of applyingscale(tuple): Erasing area rangeratio(tuple): Aspect ratio rangevalue(str or float): Erasing value
Usage Example:
from riemann.vision import transforms
# Random erasing (typically used on tensors)
erasing = transforms.RandomErasing(p=0.5, scale=(0.02, 0.33))
erased_tensor = erasing(tensor_img)
AutoAugment
AutoAugment data augmentation policy.
Parameters:
policy(str): Policy to use (‘imagenet’, ‘cifar10’, ‘svhn’)
Usage Example:
from riemann.vision import transforms
# AutoAugment with ImageNet policy
auto_augment = transforms.AutoAugment(policy='imagenet')
augmented_img = auto_augment(pil_image)
RandAugment
RandAugment data augmentation policy.
Parameters:
num_ops(int): Number of operationsmagnitude(int): Magnitude of operations
Usage Example:
from riemann.vision import transforms
# RandAugment
rand_augment = transforms.RandAugment(num_ops=2, magnitude=9)
augmented_img = rand_augment(pil_image)
TrivialAugmentWide
TrivialAugmentWide data augmentation policy.
Usage Example:
from riemann.vision import transforms
# TrivialAugmentWide
trivial_augment = transforms.TrivialAugmentWide()
augmented_img = trivial_augment(pil_image)
SanitizeBoundingBox
Sanitize and validate bounding boxes.
Usage Example:
from riemann.vision import transforms
# Sanitize bounding boxes
sanitize = transforms.SanitizeBoundingBox()
sanitized_boxes = sanitize(boxes, image_size)
Invert
Invert image colors.
Usage Example:
from riemann.vision import transforms
# Invert image
invert = transforms.Invert()
inverted_img = invert(pil_image)
Posterize
Reduce number of bits for each color channel.
Parameters:
bits(int): Number of bits to keep
Usage Example:
from riemann.vision import transforms
# Posterize image
posterize = transforms.Posterize(bits=4)
posterized_img = posterize(pil_image)
Solarize
Invert pixels above threshold.
Parameters:
threshold(int): Threshold value
Usage Example:
from riemann.vision import transforms
# Solarize image
solarize = transforms.Solarize(threshold=128)
solarized_img = solarize(pil_image)
Equalize
Equalize image histogram.
Usage Example:
from riemann.vision import transforms
# Equalize image
equalize = transforms.Equalize()
equalized_img = equalize(pil_image)
AutoContrast
Maximize image contrast.
Usage Example:
from riemann.vision import transforms
# Auto contrast
auto_contrast = transforms.AutoContrast()
contrasted_img = auto_contrast(pil_image)
Brightness
Adjust image brightness.
Parameters:
brightness_factor(float): Brightness factor
Usage Example:
from riemann.vision import transforms
# Adjust brightness
brightness = transforms.Brightness(brightness_factor=1.5)
brightened_img = brightness(pil_image)
Contrast
Adjust image contrast.
Parameters:
contrast_factor(float): Contrast factor
Usage Example:
from riemann.vision import transforms
# Adjust contrast
contrast = transforms.Contrast(contrast_factor=1.5)
contrasted_img = contrast(pil_image)
Saturation
Adjust image saturation.
Parameters:
saturation_factor(float): Saturation factor
Usage Example:
from riemann.vision import transforms
# Adjust saturation
saturation = transforms.Saturation(saturation_factor=1.5)
saturated_img = saturation(pil_image)
Hue
Adjust image hue.
Parameters:
hue_factor(float): Hue factor (-0.5 to 0.5)
Usage Example:
from riemann.vision import transforms
# Adjust hue
hue = transforms.Hue(hue_factor=0.1)
hue_adjusted_img = hue(pil_image)
Complete Examples
The following examples demonstrate how to use Riemann’s computer vision module for common deep learning tasks.
Image Classification Training Pipeline
This example demonstrates the complete workflow of image classification using the CIFAR-10 dataset, including data loading, data augmentation, model definition, training, and evaluation.
Pipeline Overview:
Data Preprocessing: Use random cropping, horizontal flipping, and color jittering for data augmentation
Normalization: Normalize using ImageNet statistics
Model Definition: Simple convolutional neural network
Training Loop: Standard training flow including forward propagation, loss calculation, backward propagation, and parameter updates
import riemann as rm
import riemann.nn as nn
import riemann.optim as optim
from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader
# Define training data transforms (with data augmentation)
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224), # Random crop and resize
transforms.RandomHorizontalFlip(), # Random horizontal flip
transforms.ColorJitter( # Color jitter (data augmentation)
brightness=0.2,
contrast=0.2
),
transforms.ToTensor(), # Convert to tensor
transforms.Normalize( # Normalize
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Define test data transforms (without data augmentation)
test_transform = transforms.Compose([
transforms.Resize(256), # Resize
transforms.CenterCrop(224), # Center crop
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=train_transform
)
test_dataset = datasets.CIFAR10(
root='./data',
train=False,
download=True,
transform=test_transform
)
# Create data loaders
train_loader = DataLoader(
train_dataset,
batch_size=32,
shuffle=True, # Shuffle data during training
num_workers=4 # Use 4 subprocesses to load data
)
test_loader = DataLoader(
test_dataset,
batch_size=32,
shuffle=False
)
# Define convolutional neural network model
model = nn.Sequential(
# First convolutional block
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
# Second convolutional block
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
# Fully connected layer
nn.Flatten(),
nn.Linear(128 * 8 * 8, 10) # CIFAR-10 has 10 classes
)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(
model.parameters(),
lr=0.01,
momentum=0.9
)
# Training loop
num_epochs = 10
for epoch in range(num_epochs):
model.train() # Set model to training mode
running_loss = 0.0
for batch_idx, (images, labels) in enumerate(train_loader):
# Forward propagation
outputs = model(images)
loss = criterion(outputs, labels)
# Backward propagation and optimization
optimizer.zero_grad() # Clear gradients
loss.backward() # Compute gradients
optimizer.step() # Update parameters
running_loss += loss.item()
# Print progress every 100 batches
if (batch_idx + 1) % 100 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], '
f'Batch [{batch_idx+1}/{len(train_loader)}], '
f'Loss: {running_loss/100:.4f}')
running_loss = 0.0
print(f'Epoch {epoch+1} completed')
print('Training completed!')
Loading Custom Dataset with ImageFolder
When you have your own image dataset, you can use ImageFolder for convenient loading. Just organize images by folders, with each folder representing a class.
Required Folder Structure:
custom_dataset/
├── class_a/ # Images for class A
│ ├── img1.jpg
│ └── img2.png
├── class_b/ # Images for class B
│ ├── img1.jpg
│ └── img2.jpg
└── class_c/ # Images for class C
└── img1.jpg
Loading Example:
from riemann.vision import datasets, transforms
from riemann.utils.data import DataLoader
# Define data transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
# Load dataset using ImageFolder
dataset = datasets.ImageFolder(
root='./custom_dataset',
transform=transform
)
# View dataset information
print(f"Number of classes: {len(dataset.classes)}")
print(f"Class names: {dataset.classes}")
print(f"Class to index mapping: {dataset.class_to_idx}")
print(f"Total samples: {len(dataset)}")
# Create data loader
loader = DataLoader(dataset, batch_size=32, shuffle=True)
# Iterate through data
for images, labels in loader:
print(f"Image batch shape: {images.shape}") # [32, 3, 224, 224]
print(f"Label batch shape: {labels.shape}") # [32]
break
Creating Custom Dataset Class
When ImageFolder cannot meet your needs, you can inherit from the Dataset class to create a custom dataset. The following example shows how to create a custom dataset that loads images from folders.
Applicable Scenarios:
Need custom file organization
Need to load data from other sources (e.g., database, network)
Need complex preprocessing
from riemann.utils.data import Dataset
from PIL import Image
import os
class CustomImageDataset(Dataset):
"""
Custom image dataset class
Load images from folders with structure:
root/
label1/
image1.jpg
image2.jpg
label2/
image1.jpg
"""
def __init__(self, root_dir, transform=None):
"""
Parameters:
root_dir (str): Root directory of dataset
transform (callable, optional): Image transform function
"""
self.root_dir = root_dir
self.transform = transform
self.images = []
self.labels = []
# Scan folders, collect all image paths and labels
for label in sorted(os.listdir(root_dir)):
label_dir = os.path.join(root_dir, label)
if os.path.isdir(label_dir):
for img_name in os.listdir(label_dir):
if img_name.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):
self.images.append(os.path.join(label_dir, img_name))
self.labels.append(int(label))
print(f"Loaded {len(self.images)} images, {len(set(self.labels))} classes")
def __len__(self):
"""Return dataset size"""
return len(self.images)
def __getitem__(self, idx):
"""
Get sample at specified index
Parameters:
idx (int): Sample index
Returns:
tuple: (image, label)
"""
# Load image
img_path = self.images[idx]
image = Image.open(img_path).convert('RGB')
label = self.labels[idx]
# Apply transforms
if self.transform:
image = self.transform(image)
return image, label
# Use custom dataset
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
dataset = CustomImageDataset(
root_dir='./custom_data',
transform=transform
)
loader = DataLoader(
dataset,
batch_size=32,
shuffle=True,
num_workers=2
)
# Test data loading
for images, labels in loader:
print(f"Batch image shape: {images.shape}")
print(f"Batch labels: {labels}")
break