Massachusetts Undergraduate Research Conference (MassURC)

In machine learning, neural scaling laws empirically describe predictable power-law relationships between model size, dataset size, compute, and the network's overall test loss. There has been extensive testing done on large language model and transformer models with billions of parameters, but it is unclear if these laws, being asymptotic, can also be used to describe smaller convolutional networks. This work investigates scaling behaviors of small CNNs (between 18k-4m parameters) trained on various portion of the CIFAR-10 dataset.

Model width, depth, and dataset size were systematically varied, and test loss was analyzed using power-law regression. Strong scaling relationships were observed between loss and parameter count (R² up to 0.996), with width scaling exponent α ≈ 0.16 and depth scaling exponent α ≈ 0.26. Dataset scaling exhibited a larger exponent β ≈ 0.43 (R² ≈ 0.99), indicating greater sensitivity to data size than model size.

The strength of these scaling relationships enables evaluation of model–data allocation under fixed compute constraints. Using estimated values of α and β, we derive predicted optimal ratios for parameter count and dataset size under a fixed training budget and empirically test these predictions.

If validated, this approach would provide a practical framework for compute-efficient training, allowing researchers with limited resources or large experimental workloads to reduce reliance on exhaustive hyperparameter sweeps.

Poster Session 3, 1:15 PM - 2:00 PM: Room 163 [C7]