Efficient Pairwise Difference Learning: A Comparative Study of Encoding Strategies and Training Pair Selection
Pairwise Difference Learning for Classification (PDC) enhances supervised learning in low-data regimes by training a binary similarity classifier on pairs of instances rather than individual samples, effectively inflating the training set from n to O(n²) examples. However, the PDC framework leaves critical implementation choices underspecified, particularly regarding input representation and pair selection strategies. This work systematically investigates these design decisions through extensive empirical evaluation. We compare four feature encoders—paired features, triplet with relative difference, absolute difference, and a novel triplet with absolute difference—alongside four pair selection strategies ranging from exhaustive to kernel-thinning-based schemes. Experiments across nearly 200 benchmark classification datasets using Decision Tree, Multilayer Perceptron, Gaussian Naïve Bayes, and Support Vector Classifier as base models reveal that the proposed triplet with absolute difference encoder (ϕ3dabs) achieves the highest average F1 scores. More notably, restricting training to upper-triangle pairs reduces computational time by approximately 40\% with negligible accuracy loss. Kernel thinning-based selection provided no accuracy benefits while substantially increasing computational overhead. Overall, PDC configurations outperformed base models in 94\% of tested cases, though no single configuration dominated universally, suggesting dataset-specific optimization remains an open challenge.
