God's Favourite.
ML coordinate prediction in visual data. A model that learns where to look.
The premise is simple: given an image, predict a coordinate. The execution is not. An EfficientNet-b0 backbone handles feature extraction, with a regression head outputting (x, y) predictions. The training signal comes from an unexpected source — human gaze data, aggregated across thousands of responses, used to teach the model what humans actually look at versus what a naive detector would select.
The gaze overlay is the differentiator. Raw image features get you in the ballpark. The human attention signal — where people actually focus when they look at the same image — narrows the prediction to a point the model could not reach on its own. The gap between “what the image contains” and “where humans converge” is the entire value of the system.
Won on the first entry. Six competitions completed with active iteration on the model architecture, training pipeline, and feature engineering. The feedback loop is tight — compete, score, adjust, re-enter.