To understand why we use the t-distribution, you need to know what is the underlying distribution of and of the Residual sum of squares () as these two put together will give you the t-distribution.
The easier part is the distribution of which is a normal distribution - to see this note that = so it is a linear function of where . As a result it is also normally distributed, - let me know if you need help deriving the distribution of .
Additionally, , where is the number of observations and is the number of parameters used in your regression. The proof of this is a bit more involved, but also straightforward to derive (see proof here Why is RSS distributed chi square times n-p?).
Up until this point I have considered everything in matrix/vector notation, but let's for simplicity use and use its normal distribution which will give us:
Additionally, from the chi-squared distribution of we have that:
This was simply a rearrangement of the first chi-squared expression and is independent of the . Additionally, we define , which is an unbiased estimator for . By the definition of the definition that dividing a normal distribution by an independent chi-squared (over its degrees of freedom) gives you a t-distribution (for the proof see: A normal divided by the gives you a t-distribution -- proof) you get that:
Let me know if it makes sense.