Computer Vision =============== Riemann provides comprehensive support for computer vision tasks through the ``riemann.vision`` module, including popular datasets, image transformations, and data loading utilities. .. contents:: Table of Contents :local: :depth: 2 Overview -------- The ``riemann.vision`` module includes the following main components: - **Datasets**: Popular datasets like MNIST, CIFAR-10, Flowers102, OxfordIIITPet, LFWPeople, SVHN, ImageFolder, etc. - **Transforms**: Image preprocessing and data augmentation operations - **Data Loading**: Seamless integration with ``DataLoader``, supporting batch loading and parallel processing Quick Start ----------- .. code-block:: python import riemann as rm from riemann.vision import datasets, transforms from riemann.utils.data import DataLoader # Define data transformations transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) # Load dataset train_dataset = datasets.MNIST(root='./data', train=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) # Iterate through data for images, labels in train_loader: print(f"Image batch shape: {images.shape}") # [64, 1, 28, 28] print(f"Label batch shape: {labels.shape}") # [64] break Datasets -------- Riemann provides various popular computer vision datasets. All datasets inherit from the ``Dataset`` class and can be used with ``DataLoader``. Dataset Overview ~~~~~~~~~~~~~~~~ .. list-table:: Supported Datasets :header-rows: 1 :widths: 20 35 15 30 * - Dataset - Description - Size - Download Source * - MNIST - Handwritten digit recognition (0-9), 28×28 grayscale images - 60,000 train / 10,000 test - AWS S3 (ossci-datasets) * - FashionMNIST - Fashion product images (10 categories), 28×28 grayscale - 60,000 train / 10,000 test - Zalando Research * - CIFAR-10 - 10-class object recognition, 32×32 color images - 50,000 train / 10,000 test - University of Toronto * - CIFAR-100 - 100-class object recognition with 20 superclasses, 32×32 color images - 50,000 train / 10,000 test - University of Toronto * - Flowers102 - 102 flower categories classification - 1,020 train / 1,020 val / 6,149 test - Oxford VGG * - OxfordIIITPet - 37 pet breeds (cats and dogs) classification - ~7,000 images (~200 per class) - Oxford VGG * - LFWPeople - Face recognition dataset with multiple identities - 13,233 images / 5,749 people - UMass Amherst * - SVHN - Street View House Numbers, 32×32 color images - 73,257 train / 26,032 test / 531,131 extra - Stanford University * - ImageFolder - Generic folder-based dataset loader - User-defined - Local files * - DatasetFolder - Generic folder dataset with custom loader - User-defined - Local files MNIST Dataset ~~~~~~~~~~~~~ Handwritten digit recognition dataset containing 60,000 training images and 10,000 test images, with image size of 28×28 pixels. **Parameters**: - ``root`` (str): Root directory for data storage - ``train`` (bool): ``True`` to load training set, ``False`` to load test set - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Usage Example**: .. code-block:: python from riemann.vision.datasets import MNIST from riemann.utils.data import DataLoader # Load training and test sets train_dataset = MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()) test_dataset = MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor()) # Create data loaders train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) EasyMNIST (Preprocessed MNIST) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ EasyMNIST is a preprocessed version of MNIST that applies normalization, standardization, and flattening during initialization. Labels can be converted to one-hot encoding. This saves preprocessing time during training as transformations are applied once at initialization rather than during each epoch. **Parameters**: - ``root`` (str): Root directory for data storage - ``train`` (bool): ``True`` to load training set, ``False`` to load test set - ``onehot_label`` (bool): If True, convert labels to one-hot encoding (default: True) - ``download`` (bool, optional): If True, downloads the dataset from the internet **Usage Example**: .. code-block:: python from riemann.vision.datasets import EasyMNIST # Load EasyMNIST with one-hot labels (default) train_dataset = EasyMNIST(root='./data', train=True, onehot_label=True, download=True) # Load with scalar labels test_dataset = EasyMNIST(root='./data', train=False, onehot_label=False, download=True) # Data is already preprocessed (normalized, flattened) image, label = train_dataset[0] print(f"Image shape: {image.shape}") # [784] - flattened print(f"Label shape: {label.shape}") # [10] - one-hot if onehot_label=True FashionMNIST Dataset ~~~~~~~~~~~~~~~~~~~~ Fashion-MNIST is a dataset of Zalando's article images consisting of 60,000 training examples and 10,000 test examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. It is designed to be a drop-in replacement for MNIST. **Classes**: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot **Parameters**: - ``root`` (str): Root directory for data storage - ``train`` (bool): ``True`` to load training set, ``False`` to load test set - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Usage Example**: .. code-block:: python from riemann.vision.datasets import FashionMNIST # Load FashionMNIST dataset train_dataset = FashionMNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()) test_dataset = FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor()) print(f"Classes: {train_dataset.classes}") CIFAR-10 Dataset ~~~~~~~~~~~~~~~~ Contains 60,000 32×32 color images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). **Parameters**: - ``root`` (str): Root directory for data storage - ``train`` (bool): ``True`` to load training set (50,000 images), ``False`` to load test set (10,000 images) - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Usage Example**: .. code-block:: python from riemann.vision.datasets import CIFAR10 # Load CIFAR-10 dataset train_dataset = CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor()) test_dataset = CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor()) CIFAR-100 Dataset ~~~~~~~~~~~~~~~~~ Contains 60,000 32×32 color images in 100 classes. Each class has 600 images (500 for training, 100 for testing). CIFAR-100 has 100 fine-grained classes and 20 superclasses. **Parameters**: - ``root`` (str): Root directory for data storage - ``train`` (bool): ``True`` to load training set (50,000 images), ``False`` to load test set (10,000 images) - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet - ``coarse`` (bool, optional): If True, uses 20 superclass labels; otherwise uses 100 fine-grained class labels (default: False) **Usage Example**: .. code-block:: python from riemann.vision.datasets import CIFAR100 # Load CIFAR-100 with fine-grained labels (100 classes) train_dataset = CIFAR100(root='./data', train=True, download=True, coarse=False) test_dataset = CIFAR100(root='./data', train=False, download=True, coarse=False) # Load CIFAR-100 with superclass labels (20 classes) train_dataset_coarse = CIFAR100(root='./data', train=True, download=True, coarse=True) Flowers102 Dataset ~~~~~~~~~~~~~~~~~~ Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers were chosen to be flowers commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. **Note**: This class requires ``scipy`` to load target files from ``.mat`` format. **Parameters**: - ``root`` (str): Root directory of the dataset - ``split`` (str, optional): The dataset split, supports ``"train"`` (default), ``"val"``, or ``"test"`` - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Target transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Dataset Statistics**: - Train: 1,020 images - Validation: 1,020 images - Test: 6,149 images - Total: 8,189 images across 102 classes **Usage Example**: .. code-block:: python from riemann.vision.datasets import Flowers102 # Load Flowers102 dataset train_dataset = Flowers102(root='./data', split='train', download=True, transform=transforms.ToTensor()) val_dataset = Flowers102(root='./data', split='val', download=True, transform=transforms.ToTensor()) test_dataset = Flowers102(root='./data', split='test', download=True, transform=transforms.ToTensor()) print(f"Train samples: {len(train_dataset)}") # 1020 print(f"Validation samples: {len(val_dataset)}") # 1020 print(f"Test samples: {len(test_dataset)}") # 6149 OxfordIIITPet Dataset ~~~~~~~~~~~~~~~~~~~~~ The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of species (cat or dog), breed, and pixel-level trimap segmentation. **Parameters**: - ``root`` (str): Root directory of the dataset - ``split`` (str, optional): The dataset split, supports ``"trainval"`` (default) or ``"test"`` - ``target_types`` (str or list, optional): Types of target to use. Can be ``"category"`` (default), ``"binary-category"``, or ``"segmentation"``. Can also be a list to output a tuple with all specified target types. - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Target transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Target Types**: - ``category`` (int): Label for one of the 37 pet categories - ``binary-category`` (int): Binary label for cat (0) or dog (1) - ``segmentation`` (PIL Image): Segmentation trimap of the image **Usage Example**: .. code-block:: python from riemann.vision.datasets import OxfordIIITPet # Load with category labels dataset = OxfordIIITPet(root='./data', split='trainval', target_types='category', download=True) # Load with binary classification (cat vs dog) dataset_bin = OxfordIIITPet(root='./data', split='trainval', target_types='binary-category', download=True) # Load with segmentation masks dataset_seg = OxfordIIITPet(root='./data', split='trainval', target_types='segmentation', download=True) # Load with multiple target types dataset_multi = OxfordIIITPet(root='./data', split='trainval', target_types=['category', 'segmentation'], download=True) LFWPeople Dataset ~~~~~~~~~~~~~~~~~ LFW (Labeled Faces in the Wild) People dataset contains 13,233 face images collected from the web. The images are organized into 5,749 different identities. This dataset is designed for face recognition research. **Parameters**: - ``root`` (str): Root directory of the dataset - ``split`` (str, optional): The dataset split, supports ``"10fold"`` (default), ``"train"``, or ``"test"`` - ``image_set`` (str, optional): The image alignment type, supports ``"original"``, ``"funneled"`` (default), or ``"deepfunneled"`` - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Target transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Image Sets**: - ``original``: Original images without alignment - ``funneled``: Geometrically normalized face images (default) - ``deepfunneled``: Deep funneled images with better alignment **Usage Example**: .. code-block:: python from riemann.vision.datasets import LFWPeople # Load LFWPeople dataset with funneled images train_dataset = LFWPeople(root='./data', split='train', image_set='funneled', download=True) test_dataset = LFWPeople(root='./data', split='test', image_set='funneled', download=True) print(f"Number of classes (people): {len(train_dataset.classes)}") print(f"Train samples: {len(train_dataset)}") SVHN Dataset ~~~~~~~~~~~~ SVHN (Street View House Numbers) dataset contains 32×32 color images of house numbers collected from Google Street View. The dataset includes 10 digit classes (0-9). **Note**: This class requires ``scipy`` to load data from ``.mat`` format. **Parameters**: - ``root`` (str): Root directory of the dataset - ``split`` (str): The dataset split, supports ``"train"``, ``"test"``, or ``"extra"`` - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Target transformation function - ``download`` (bool, optional): If True, downloads the dataset from the internet **Dataset Statistics**: - Train: 73,257 images - Test: 26,032 images - Extra: 531,131 additional images (less difficult samples) **Usage Example**: .. code-block:: python from riemann.vision.datasets import SVHN # Load SVHN dataset train_dataset = SVHN(root='./data', split='train', download=True, transform=transforms.ToTensor()) test_dataset = SVHN(root='./data', split='test', download=True, transform=transforms.ToTensor()) # Also available: extra split with additional training data extra_dataset = SVHN(root='./data', split='extra', download=True, transform=transforms.ToTensor()) ImageFolder Dataset ~~~~~~~~~~~~~~~~~~~ Load image datasets from local folders, suitable for custom datasets. Folder structure should be organized by class: .. code-block:: text root/ ├── class_a/ │ ├── img1.jpg │ └── img2.png ├── class_b/ │ ├── img1.jpg │ └── img2.jpg └── class_c/ └── img1.jpg **Parameters**: - ``root`` (str): Root directory path of the dataset - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``loader`` (callable, optional): Image loading function, defaults to PIL Image loader - ``is_valid_file`` (callable, optional): Function to validate if a file is valid **Usage Example**: .. code-block:: python from riemann.vision.datasets import ImageFolder # Load custom dataset from folder dataset = ImageFolder( root='./custom_dataset', transform=transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), ]) ) print(f"Number of classes: {len(dataset.classes)}") # ['class_a', 'class_b', 'class_c'] print(f"Class to index mapping: {dataset.class_to_idx}") # {'class_a': 0, 'class_b': 1, 'class_c': 2} DatasetFolder ~~~~~~~~~~~~~ Generic folder dataset class, similar to ``ImageFolder`` but allows custom image loaders. **Parameters**: - ``root`` (str): Root directory path of the dataset - ``loader`` (callable): Image loading function - ``extensions`` (tuple, optional): Tuple of allowed file extensions - ``transform`` (callable, optional): Image transformation function - ``target_transform`` (callable, optional): Label transformation function - ``is_valid_file`` (callable, optional): Function to validate if a file is valid - ``allow_empty`` (bool): Whether to allow empty folders, default ``False`` **Usage Example**: .. code-block:: python from riemann.vision.datasets import DatasetFolder, default_loader # Use custom loader dataset = DatasetFolder( root='./custom_dataset', loader=default_loader, extensions=('.jpg', '.png'), transform=transforms.ToTensor() ) default_loader ~~~~~~~~~~~~~~ ``default_loader`` is the default image loading function used by ``ImageFolder`` and ``DatasetFolder``. It automatically selects the appropriate loading method based on file extension: - Image formats supported by PIL (e.g., .jpg, .png, .bmp, etc.): Loaded using PIL.Image.open() and converted to RGB mode - Other formats: Attempts to load using PIL **Purpose**: ``default_loader`` is mainly used for the ``loader`` parameter of ``ImageFolder`` and ``DatasetFolder`` to specify the image loading method. When using these two dataset classes, if the ``loader`` parameter is not specified, ``default_loader`` will be used by default. **Usage Example**: .. code-block:: python from riemann.vision.datasets import DatasetFolder, default_loader # Load image using default_loader image = default_loader('path/to/image.jpg') # Use in DatasetFolder dataset = DatasetFolder( root='./custom_dataset', loader=default_loader, # Specify to use default_loader extensions=('.jpg', '.png') ) Transforms ---------- ``riemann.vision.transforms`` provides rich image transformation operations for data preprocessing and data augmentation. Transform Overview ~~~~~~~~~~~~~~~~~~ .. list-table:: Supported Transforms :header-rows: 1 :widths: 25 35 40 * - Transform - Description - Category * - Compose - Combine multiple transforms into one - Utility * - PILToTensor - Convert PIL Image to tensor without scaling - Conversion * - ToTensor - Convert PIL Image or numpy.ndarray to tensor (scales to [0, 1]) - Conversion * - ToPILImage - Convert tensor to PIL Image - Conversion * - ConvertImageDtype - Convert image to specified data type - Conversion * - Normalize - Normalize tensor with mean and std - Normalization * - Resize - Resize image to specified size - Geometric * - CenterCrop - Crop image from center - Geometric * - RandomHorizontalFlip - Randomly flip image horizontally - Augmentation * - RandomVerticalFlip - Randomly flip image vertically - Augmentation * - RandomRotation - Randomly rotate image by angle - Augmentation * - ColorJitter - Randomly change brightness, contrast, saturation, hue - Augmentation * - Grayscale - Convert image to grayscale - Color * - RandomGrayscale - Randomly convert image to grayscale - Augmentation * - RandomCrop - Randomly crop image to specified size - Augmentation * - RandomResizedCrop - Random crop and resize image - Augmentation * - FiveCrop - Crop image into 5 regions (4 corners + center) - Geometric * - TenCrop - Crop image into 10 regions (FiveCrop + flips) - Geometric * - Pad - Pad image with specified value - Geometric * - Lambda - Apply custom lambda function - Utility * - GaussianBlur - Apply Gaussian blur to image - Filter * - RandomAffine - Random affine transformation - Augmentation * - RandomPerspective - Random perspective transformation - Augmentation * - RandomErasing - Randomly erase rectangular regions - Augmentation * - AutoAugment - AutoAugment data augmentation policy - Auto Augmentation * - RandAugment - RandAugment data augmentation policy - Auto Augmentation * - TrivialAugmentWide - TrivialAugmentWide data augmentation policy - Auto Augmentation * - SanitizeBoundingBox - Sanitize and validate bounding boxes - Detection * - Invert - Invert image colors - Color * - Posterize - Reduce number of bits for each color channel - Color * - Solarize - Invert pixels above threshold - Color * - Equalize - Equalize image histogram - Color * - AutoContrast - Maximize image contrast - Color * - Sharpness - Adjust image sharpness - Color * - Brightness - Adjust image brightness - Color * - Contrast - Adjust image contrast - Color * - Saturation - Adjust image saturation - Color * - Hue - Adjust image hue - Color Compose ~~~~~~~ Combine multiple transformations and apply them in sequence. **Parameters**: - ``transforms`` (list): List of transform objects to compose **Usage Example**: .. code-block:: python from riemann.vision import transforms # Define transformation pipeline transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) PILToTensor ~~~~~~~~~~~ Convert PIL Image to tensor without scaling. Unlike ToTensor, PILToTensor does not scale values from [0, 255] to [0.0, 1.0]. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert PIL Image to tensor (values in [0, 255]) pil_to_tensor = transforms.PILToTensor() tensor_img = pil_to_tensor(pil_image) ToTensor ~~~~~~~~ Convert PIL Image or numpy.ndarray to tensor. Scales values from [0, 255] to [0.0, 1.0]. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert PIL Image to tensor (values in [0, 1]) to_tensor = transforms.ToTensor() tensor_img = to_tensor(pil_image) ToPILImage ~~~~~~~~~~ Convert tensor to PIL Image. **Parameters**: - ``mode`` (str, optional): Color mode of the output image **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert tensor to PIL Image to_pil = transforms.ToPILImage() pil_img = to_pil(tensor) ConvertImageDtype ~~~~~~~~~~~~~~~~~ Convert image to specified data type. **Parameters**: - ``dtype`` (dtype): Target data type **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert to float32 convert_dtype = transforms.ConvertImageDtype(dtype='float32') converted_img = convert_dtype(img) Normalize ~~~~~~~~~ Normalize tensor with mean and standard deviation. **Parameters**: - ``mean`` (sequence): Mean values for each channel - ``std`` (sequence): Standard deviation values for each channel **Usage Example**: .. code-block:: python from riemann.vision import transforms # Normalize using ImageNet statistics normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) normalized_img = normalize(tensor_img) Resize ~~~~~~ Resize image to specified size. **Parameters**: - ``size`` (int or tuple): Target size. If int, smaller edge is resized to size. If tuple, (height, width). **Usage Example**: .. code-block:: python from riemann.vision import transforms # Resize to specific size resize = transforms.Resize((224, 224)) resized_img = resize(pil_image) # Resize by shorter side resize = transforms.Resize(256) resized_img = resize(pil_image) CenterCrop ~~~~~~~~~~ Crop image from center. **Parameters**: - ``size`` (int or tuple): Crop size **Usage Example**: .. code-block:: python from riemann.vision import transforms # Center crop to 224x224 center_crop = transforms.CenterCrop(224) cropped_img = center_crop(pil_image) RandomHorizontalFlip ~~~~~~~~~~~~~~~~~~~~ Randomly flip image horizontally. **Parameters**: - ``p`` (float): Probability of flipping (default: 0.5) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Flip with 50% probability hflip = transforms.RandomHorizontalFlip(p=0.5) flipped_img = hflip(pil_image) RandomVerticalFlip ~~~~~~~~~~~~~~~~~~ Randomly flip image vertically. **Parameters**: - ``p`` (float): Probability of flipping (default: 0.5) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Flip with 50% probability vflip = transforms.RandomVerticalFlip(p=0.5) flipped_img = vflip(pil_image) RandomRotation ~~~~~~~~~~~~~~ Randomly rotate image by angle. **Parameters**: - ``degrees`` (sequence or float): Range of degrees (-degrees, +degrees) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Rotate between -15 and 15 degrees rotation = transforms.RandomRotation(degrees=15) rotated_img = rotation(pil_image) ColorJitter ~~~~~~~~~~~ Randomly change brightness, contrast, saturation, and hue. **Parameters**: - ``brightness`` (float): Brightness jitter factor - ``contrast`` (float): Contrast jitter factor - ``saturation`` (float): Saturation jitter factor - ``hue`` (float): Hue jitter factor **Usage Example**: .. code-block:: python from riemann.vision import transforms # Randomly adjust color jitter = transforms.ColorJitter( brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1 ) jittered_img = jitter(pil_image) Grayscale ~~~~~~~~~ Convert image to grayscale. **Parameters**: - ``num_output_channels`` (int): Number of output channels (1 or 3) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert to grayscale (1 channel) gray = transforms.Grayscale(num_output_channels=1) gray_img = gray(pil_image) RandomGrayscale ~~~~~~~~~~~~~~~ Randomly convert image to grayscale. **Parameters**: - ``p`` (float): Probability of conversion (default: 0.1) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Convert to grayscale with 10% probability gray = transforms.RandomGrayscale(p=0.1) gray_img = gray(pil_image) RandomCrop ~~~~~~~~~~ Randomly crop image to specified size. **Parameters**: - ``size`` (int or tuple): Crop size - ``padding`` (int, optional): Padding size **Usage Example**: .. code-block:: python from riemann.vision import transforms # Random crop with padding crop = transforms.RandomCrop(224, padding=4) cropped_img = crop(pil_image) RandomResizedCrop ~~~~~~~~~~~~~~~~~ Random crop and resize image. **Parameters**: - ``size`` (int or tuple): Target size - ``scale`` (tuple): Scale range for cropping - ``ratio`` (tuple): Aspect ratio range **Usage Example**: .. code-block:: python from riemann.vision import transforms # Random resized crop crop = transforms.RandomResizedCrop(224, scale=(0.08, 1.0)) cropped_img = crop(pil_image) FiveCrop ~~~~~~~~ Crop image into 5 regions (4 corners + center). **Parameters**: - ``size`` (int or tuple): Crop size **Usage Example**: .. code-block:: python import riemann as rm from riemann.vision import transforms # Five crop five_crop = transforms.FiveCrop(224) crops = five_crop(pil_image) # Returns tuple of 5 images # Stack into batch tensor_crops = rm.stack([transforms.ToTensor()(crop) for crop in crops]) TenCrop ~~~~~~~ Crop image into 10 regions (FiveCrop + horizontal flips). **Parameters**: - ``size`` (int or tuple): Crop size - ``vertical_flip`` (bool): Also apply vertical flip **Usage Example**: .. code-block:: python import riemann as rm from riemann.vision import transforms # Ten crop ten_crop = transforms.TenCrop(224) crops = ten_crop(pil_image) # Returns tuple of 10 images Pad ~~~ Pad image with specified value. **Parameters**: - ``padding`` (int or tuple): Padding size - ``fill`` (int or tuple): Fill value **Usage Example**: .. code-block:: python from riemann.vision import transforms # Pad image pad = transforms.Pad(padding=4, fill=0) padded_img = pad(pil_image) Lambda ~~~~~~ Apply custom lambda function. **Parameters**: - ``lambd`` (function): Lambda function to apply **Usage Example**: .. code-block:: python from riemann.vision import transforms # Custom lambda transform lambd = transforms.Lambda(lambda x: x.rotate(45)) transformed_img = lambd(pil_image) GaussianBlur ~~~~~~~~~~~~ Apply Gaussian blur to image. **Parameters**: - ``kernel_size`` (int): Gaussian kernel size - ``sigma`` (float or tuple): Standard deviation **Usage Example**: .. code-block:: python from riemann.vision import transforms # Apply Gaussian blur blur = transforms.GaussianBlur(kernel_size=5, sigma=(0.1, 2.0)) blurred_img = blur(pil_image) RandomAffine ~~~~~~~~~~~~ Random affine transformation. **Parameters**: - ``degrees`` (float or tuple): Rotation degrees - ``translate`` (tuple): Translation range - ``scale`` (tuple): Scale range - ``shear`` (float or tuple): Shear range **Usage Example**: .. code-block:: python from riemann.vision import transforms # Random affine transformation affine = transforms.RandomAffine( degrees=15, translate=(0.1, 0.1), scale=(0.9, 1.1) ) transformed_img = affine(pil_image) RandomPerspective ~~~~~~~~~~~~~~~~~ Random perspective transformation. **Parameters**: - ``distortion_scale`` (float): Distortion scale - ``p`` (float): Probability of applying transform **Usage Example**: .. code-block:: python from riemann.vision import transforms # Random perspective perspective = transforms.RandomPerspective(distortion_scale=0.5, p=0.5) transformed_img = perspective(pil_image) RandomErasing ~~~~~~~~~~~~~ Randomly erase rectangular regions. **Parameters**: - ``p`` (float): Probability of applying - ``scale`` (tuple): Erasing area range - ``ratio`` (tuple): Aspect ratio range - ``value`` (str or float): Erasing value **Usage Example**: .. code-block:: python from riemann.vision import transforms # Random erasing (typically used on tensors) erasing = transforms.RandomErasing(p=0.5, scale=(0.02, 0.33)) erased_tensor = erasing(tensor_img) AutoAugment ~~~~~~~~~~~ AutoAugment data augmentation policy. **Parameters**: - ``policy`` (str): Policy to use ('imagenet', 'cifar10', 'svhn') **Usage Example**: .. code-block:: python from riemann.vision import transforms # AutoAugment with ImageNet policy auto_augment = transforms.AutoAugment(policy='imagenet') augmented_img = auto_augment(pil_image) RandAugment ~~~~~~~~~~~ RandAugment data augmentation policy. **Parameters**: - ``num_ops`` (int): Number of operations - ``magnitude`` (int): Magnitude of operations **Usage Example**: .. code-block:: python from riemann.vision import transforms # RandAugment rand_augment = transforms.RandAugment(num_ops=2, magnitude=9) augmented_img = rand_augment(pil_image) TrivialAugmentWide ~~~~~~~~~~~~~~~~~~ TrivialAugmentWide data augmentation policy. **Usage Example**: .. code-block:: python from riemann.vision import transforms # TrivialAugmentWide trivial_augment = transforms.TrivialAugmentWide() augmented_img = trivial_augment(pil_image) SanitizeBoundingBox ~~~~~~~~~~~~~~~~~~~ Sanitize and validate bounding boxes. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Sanitize bounding boxes sanitize = transforms.SanitizeBoundingBox() sanitized_boxes = sanitize(boxes, image_size) Invert ~~~~~~ Invert image colors. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Invert image invert = transforms.Invert() inverted_img = invert(pil_image) Posterize ~~~~~~~~~ Reduce number of bits for each color channel. **Parameters**: - ``bits`` (int): Number of bits to keep **Usage Example**: .. code-block:: python from riemann.vision import transforms # Posterize image posterize = transforms.Posterize(bits=4) posterized_img = posterize(pil_image) Solarize ~~~~~~~~ Invert pixels above threshold. **Parameters**: - ``threshold`` (int): Threshold value **Usage Example**: .. code-block:: python from riemann.vision import transforms # Solarize image solarize = transforms.Solarize(threshold=128) solarized_img = solarize(pil_image) Equalize ~~~~~~~~ Equalize image histogram. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Equalize image equalize = transforms.Equalize() equalized_img = equalize(pil_image) AutoContrast ~~~~~~~~~~~~ Maximize image contrast. **Usage Example**: .. code-block:: python from riemann.vision import transforms # Auto contrast auto_contrast = transforms.AutoContrast() contrasted_img = auto_contrast(pil_image) Sharpness ~~~~~~~~~ Adjust image sharpness. **Parameters**: - ``sharpness_factor`` (float): Sharpness factor **Usage Example**: .. code-block:: python from riemann.vision import transforms # Adjust sharpness sharpness = transforms.Sharpness(sharpness_factor=2.0) sharpened_img = sharpness(pil_image) Brightness ~~~~~~~~~~ Adjust image brightness. **Parameters**: - ``brightness_factor`` (float): Brightness factor **Usage Example**: .. code-block:: python from riemann.vision import transforms # Adjust brightness brightness = transforms.Brightness(brightness_factor=1.5) brightened_img = brightness(pil_image) Contrast ~~~~~~~~ Adjust image contrast. **Parameters**: - ``contrast_factor`` (float): Contrast factor **Usage Example**: .. code-block:: python from riemann.vision import transforms # Adjust contrast contrast = transforms.Contrast(contrast_factor=1.5) contrasted_img = contrast(pil_image) Saturation ~~~~~~~~~~ Adjust image saturation. **Parameters**: - ``saturation_factor`` (float): Saturation factor **Usage Example**: .. code-block:: python from riemann.vision import transforms # Adjust saturation saturation = transforms.Saturation(saturation_factor=1.5) saturated_img = saturation(pil_image) Hue ~~~ Adjust image hue. **Parameters**: - ``hue_factor`` (float): Hue factor (-0.5 to 0.5) **Usage Example**: .. code-block:: python from riemann.vision import transforms # Adjust hue hue = transforms.Hue(hue_factor=0.1) hue_adjusted_img = hue(pil_image) Complete Examples ----------------- The following examples demonstrate how to use Riemann's computer vision module for common deep learning tasks. Image Classification Training Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example demonstrates the complete workflow of image classification using the CIFAR-10 dataset, including data loading, data augmentation, model definition, training, and evaluation. **Pipeline Overview**: 1. **Data Preprocessing**: Use random cropping, horizontal flipping, and color jittering for data augmentation 2. **Normalization**: Normalize using ImageNet statistics 3. **Model Definition**: Simple convolutional neural network 4. **Training Loop**: Standard training flow including forward propagation, loss calculation, backward propagation, and parameter updates .. code-block:: python import riemann as rm import riemann.nn as nn import riemann.optim as optim from riemann.vision import datasets, transforms from riemann.utils.data import DataLoader # Define training data transforms (with data augmentation) train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), # Random crop and resize transforms.RandomHorizontalFlip(), # Random horizontal flip transforms.ColorJitter( # Color jitter (data augmentation) brightness=0.2, contrast=0.2 ), transforms.ToTensor(), # Convert to tensor transforms.Normalize( # Normalize mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) # Define test data transforms (without data augmentation) test_transform = transforms.Compose([ transforms.Resize(256), # Resize transforms.CenterCrop(224), # Center crop transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) # Load CIFAR-10 dataset train_dataset = datasets.CIFAR10( root='./data', train=True, download=True, transform=train_transform ) test_dataset = datasets.CIFAR10( root='./data', train=False, download=True, transform=test_transform ) # Create data loaders train_loader = DataLoader( train_dataset, batch_size=32, shuffle=True, # Shuffle data during training num_workers=4 # Use 4 subprocesses to load data ) test_loader = DataLoader( test_dataset, batch_size=32, shuffle=False ) # Define convolutional neural network model model = nn.Sequential( # First convolutional block nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2), # Second convolutional block nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2), # Fully connected layer nn.Flatten(), nn.Linear(128 * 8 * 8, 10) # CIFAR-10 has 10 classes ) # Define loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD( model.parameters(), lr=0.01, momentum=0.9 ) # Training loop num_epochs = 10 for epoch in range(num_epochs): model.train() # Set model to training mode running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): # Forward propagation outputs = model(images) loss = criterion(outputs, labels) # Backward propagation and optimization optimizer.zero_grad() # Clear gradients loss.backward() # Compute gradients optimizer.step() # Update parameters running_loss += loss.item() # Print progress every 100 batches if (batch_idx + 1) % 100 == 0: print(f'Epoch [{epoch+1}/{num_epochs}], ' f'Batch [{batch_idx+1}/{len(train_loader)}], ' f'Loss: {running_loss/100:.4f}') running_loss = 0.0 print(f'Epoch {epoch+1} completed') print('Training completed!') Loading Custom Dataset with ImageFolder ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you have your own image dataset, you can use ``ImageFolder`` for convenient loading. Just organize images by folders, with each folder representing a class. **Required Folder Structure**: .. code-block:: text custom_dataset/ ├── class_a/ # Images for class A │ ├── img1.jpg │ └── img2.png ├── class_b/ # Images for class B │ ├── img1.jpg │ └── img2.jpg └── class_c/ # Images for class C └── img1.jpg **Loading Example**: .. code-block:: python from riemann.vision import datasets, transforms from riemann.utils.data import DataLoader # Define data transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) # Load dataset using ImageFolder dataset = datasets.ImageFolder( root='./custom_dataset', transform=transform ) # View dataset information print(f"Number of classes: {len(dataset.classes)}") print(f"Class names: {dataset.classes}") print(f"Class to index mapping: {dataset.class_to_idx}") print(f"Total samples: {len(dataset)}") # Create data loader loader = DataLoader(dataset, batch_size=32, shuffle=True) # Iterate through data for images, labels in loader: print(f"Image batch shape: {images.shape}") # [32, 3, 224, 224] print(f"Label batch shape: {labels.shape}") # [32] break Creating Custom Dataset Class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When ``ImageFolder`` cannot meet your needs, you can inherit from the ``Dataset`` class to create a custom dataset. The following example shows how to create a custom dataset that loads images from folders. **Applicable Scenarios**: - Need custom file organization - Need to load data from other sources (e.g., database, network) - Need complex preprocessing .. code-block:: python from riemann.utils.data import Dataset from PIL import Image import os class CustomImageDataset(Dataset): """ Custom image dataset class Load images from folders with structure: root/ label1/ image1.jpg image2.jpg label2/ image1.jpg """ def __init__(self, root_dir, transform=None): """ Parameters: root_dir (str): Root directory of dataset transform (callable, optional): Image transform function """ self.root_dir = root_dir self.transform = transform self.images = [] self.labels = [] # Scan folders, collect all image paths and labels for label in sorted(os.listdir(root_dir)): label_dir = os.path.join(root_dir, label) if os.path.isdir(label_dir): for img_name in os.listdir(label_dir): if img_name.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')): self.images.append(os.path.join(label_dir, img_name)) self.labels.append(int(label)) print(f"Loaded {len(self.images)} images, {len(set(self.labels))} classes") def __len__(self): """Return dataset size""" return len(self.images) def __getitem__(self, idx): """ Get sample at specified index Parameters: idx (int): Sample index Returns: tuple: (image, label) """ # Load image img_path = self.images[idx] image = Image.open(img_path).convert('RGB') label = self.labels[idx] # Apply transforms if self.transform: image = self.transform(image) return image, label # Use custom dataset transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), ]) dataset = CustomImageDataset( root_dir='./custom_data', transform=transform ) loader = DataLoader( dataset, batch_size=32, shuffle=True, num_workers=2 ) # Test data loading for images, labels in loader: print(f"Batch image shape: {images.shape}") print(f"Batch labels: {labels}") break