I find this topic quite interesting and current answers are unfortunately incomplete or partly misleading - despite the relevance and high popularity of this question.
By definition of classical OLS framework there should be no relationship between ŷŷ and ˆuu^, since the residuals obtained are per construction uncorrelated with ŷŷ when deriving the OLS estimator. The variance minimizing property under homoskedasticity ensures that the residual error are randomly spread around the fitted values. This can be formally shown by:
Cov(ŷ,û|X)=Cov(Py,My|X)=Cov(Py,(I−P)y|X)=PCov(y,y)(I−P)′
Cov(ŷ ,û |X)=Cov(Py,My|X)=Cov(Py,(I−P)y|X)=PCov(y,y)(I−P)′
=Pσ2−Pσ2=0=Pσ2−Pσ2=0
Where MM and PP are idempotent matrices defined as: P=X(X′X)X′P=X(X′X)X′ and M=I−PM=I−P.
This result is based on strict exogeneity and homoskedasticity, and practically holds in large samples. The intuition for their uncorrelatedness is the following: The fitted values ŷŷ conditional on XX are centered around ûû , which are thought as independently and identically distributed. However, any deviation from the strict exogeneity and homoskedasticity assumption could cause the explanatory variables to be endogenous and spur a latent correlation between ûû and ŷŷ .
Now the correlation between the residuals ûû and the "original" yy is a completely different story:
Cov(y,û|X)=Cov(yMy|X)=Cov(y,(1−P)y)=Cov(y,y)(1−P)=σ2M
Cov(y,û |X)=Cov(yMy|X)=Cov(y,(1−P)y)=Cov(y,y)(1−P)=σ2M
Some checking in the theory and we know that this covariance matrix is identical to the covariance matrix of the residual ˆuu^ itself (proof omitted). We have:
Var(û)=σ2M=Cov(y,û|X)
Var(û )=σ2M=Cov(y,û |X)
If we would like to calculate the (scalar) covariance between yy and ˆuu^ as requested by the OP, we obtain:
⟹Covscalar(y,û|X)=Var(û|X)=(∑u2i)/N
⟹Covscalar(y,û |X)=Var(û |X)=(∑u2i)/N
(= by summing up of the diagonal entries of the covariance matrix and divide by N)
The above formula indicates an interesting point. If we test the relationship by regressing yy on the residuals ˆuu^ (+constant), the slope coefficient βˆu,y=1βu^,y=1, which can be easily derived when we divide the above expression by the Var(û|X)Var(û |X).
On the other hand, the correlation is the standardized covariance by the respective standard deviations. Now, the variance matrix of the residuals is σ2Mσ2M, while the variance of yy is σ2Iσ2I. The correlation Corr(y,û)Corr(y,û ) becomes therefore:
Corr(y,û)=Var(û)√Var(ˆu)Var(y)=√Var(û)Var(y)=√Var(û)σ2
Corr(y,û )=Var(û )Var(u^)Var(y)−−−−−−−−−−−√=Var(û )Var(y)−−−−−−√=Var(û )σ2−−−−−−√
This is the core result which ought to hold in a linear regression. The intuition is that the Corr(y,û)Corr(y,û ) expresses the error between the true variance of the error term and a proxy for the variance based on residuals. Notice that the variance of yy is equal to the variance of ˆyy^ plus the variance of the residuals ˆuu^. So it can be more intuitively rewritten as:
Corr(y,û)=1√1+Var(^y)Var(û)
Corr(y,û )=11+Var(y)^Var(û )−−−−−−−−√
The are two forces here at work. If we have a great fit of the regression line, the correlation is expected to be low due to Var(û)≈0Var(û )≈0. On the other hand, Var(ˆy)Var(y^) is a bit of a fudge to esteem as it is unconditional and a line in parameter space. Comparing an unconditional and conditional variances within a ratio may not be an appropriate indicator after all. Perhaps, that's why it rarely done in practice.
An attempt conclude the question: The correlation between yy and ûû is positive and relates to the ratio of the variance of the residuals and the variance of the true error term, proxied by the unconditional variance in yy. Hence, it is a bit of a misleading indicator.
Notwithstanding this exercise may give us some intuition on the workings and inherent theoretical assumptions of an OLS regression, we rarely evaluate the correlation between yy and ûû . There are certainly more established tests for checking properties of the true error term. Secondly, keep in mind that the residuals are not the error term, and tests on residuals ûû that make predictions of the characteristics on the true error term uu are limited and their validity need to be handled with utmost care.
For example, I would like to point out a statement made by a previous poster here. It is said that,
"If your residuals are correlated with your independent variables, then your model is heteroskedastic..."
I think that may not be entirely valid in this context. Believe it or not, but the OLS residuals ûû are by construction made to be uncorrelated with the independent variable xkxk. To see this, consider:
X′ui=X′My=X′(I−P)y=X′y−X′Py
X′ui=X′My=X′(I−P)y=X′y−X′Py
=X′y−X′X(X′X)X′y=X′y−X′y=0=X′y−X′X(X′X)X′y=X′y−X′y=0
⟹X′ui=0⟹Cov(X′,ui|X)=0⟹Cov(xki,ui|xki)=0⟹X′ui=0⟹Cov(X′,ui|X)=0⟹Cov(xki,ui|xki)=0
However, you may have heard claims that an explanatory variable is correlated with the error term. Notice that such claims are based on assumptions about the whole population with a true underlying regression model, that we do not observe first hand. Consequently, checking the correlation between yy and ûû seems pointless in a linear OLS framework. However, when testing for heteroskedasticity, we take here into account the second conditional moment, for example, we regress the squared residuals on XX or a function of XX, as it is often the case with FGSL estimators. This is different from evaluating the plain correlation. I hope this helps to make matters more clear.