In machine learning, neural scaling laws empirically describe
predictable power-law relationships between model size, dataset size,
compute, and the network's overall test loss. There has been extensive
testing done on large language model and transformer models with billions of
parameters, but it is unclear if these laws, being asymptotic, can also
be used to describe smaller convolutional networks. This work
investigates scaling behaviors of small CNNs (between 18k-4m parameters)
trained on various portion of the CIFAR-10 dataset.
Model
width, depth, and dataset size were systematically varied, and test loss
was analyzed using power-law regression. Strong scaling relationships
were observed between loss and parameter count (R² up to 0.996), with
width scaling exponent α ≈ 0.16 and depth scaling exponent α ≈ 0.26.
Dataset scaling exhibited a larger exponent β ≈ 0.43 (R² ≈ 0.99),
indicating greater sensitivity to data size than model size.
The strength of these scaling relationships enables evaluation of model–data allocation under fixed compute constraints. Using estimated values of α and β, we derive predicted optimal ratios for parameter count and dataset size under a fixed training budget and empirically test these predictions.
If validated, this approach would provide a practical framework for compute-efficient training, allowing researchers with limited resources or large experimental workloads to reduce reliance on exhaustive hyperparameter sweeps.