To understand why we use the t-distribution, you need to know what is the underlying distribution of βˆ and of the Residual sum of squares (RSS) as these two put together will give you the t-distribution.
The easier part is the distribution of βˆ which is a normal distribution - to see this note that βˆ=(XTX)−1XTY so it is a linear function of Y where Y∼N(Xβ,σ2In). As a result it is also normally distributed, βˆ∼N(β,σ2(XTX)−1) - let me know if you need help deriving the distribution of βˆ.
Additionally, RSS∼σ2χ2n−p, where n is the number of observations and p is the number of parameters used in your regression. The proof of this is a bit more involved, but also straightforward to derive (see proof here Why is RSS distributed chi square times n-p?).
Up until this point I have considered everything in matrix/vector notation, but let's for simplicity use βˆi and use its normal distribution which will give us:
βˆi−βiσ(XTX)−1ii−−−−−−−−√∼N(0,1)
Additionally, from the chi-squared distribution of RSS we have that:
(n−p)s2σ2∼χ2n−p
This was simply a rearrangement of the first chi-squared expression and is independent of the N(0,1). Additionally, we define s2=RSSn−p, which is an unbiased estimator for σ2. By the definition of the tn−p definition that dividing a normal distribution by an independent chi-squared (over its degrees of freedom) gives you a t-distribution (for the proof see: A normal divided by the χ2(s)/s−−−−−−√ gives you a t-distribution -- proof) you get that:
βˆi−βis(XTX)−1ii−−−−−−−−√∼tn−p
Where s(XTX)−1ii−−−−−−−−√=SE(βˆi).
Let me know if it makes sense.