Since the question can be interpreted in multipe ways I will split this answer into two parts.
- A: distribution families.
- B: location-scale distribution families.
The problem with case A can be easily answered/demonstrated by many families with a shape parameter.
The problem with case B is more difficult since one and a half parameters seem to be sufficient to specify location and scale (location in R and scale in R>0), and the problem becomes whether two parameters can be used to encode (multiple) shapes in addition as well. This is not so trivial. We can easily come up with specific two parameter location scale families and demonstrate that you do not have different shapes, but it does not proof that this is a fixed rule for any two parameter location scale family.
A: Can two different distributions from the same 2 parameter distribution family have the same mean and variance?
The answer is yes and it can already be shown using one of the explicitly mentioned examples: the normalized Gamma distribution
Family of normalized gamma distributions
Let Z=X−μσ with X a Gamma distributed variable. The (cumulative) distribution of Z is as below:
FZ(z;k)={01Γ(k)γ(k,zk−−√+k)ififz<−k−−√z≥−k−−√
where γ is the incomplete gamma function.
So here it is clearly the case that different Z1 and Z2 (distributions from the family of normalized gamma distributions) can have same mean and variance (namely μ=0 and σ=1) but differ based on the parameter k (often denoted 'shape' parameter). This is closely linked to the fact that the family of gamma distributions is not a location-scale family.
B: Can two different distributions from the same 2 parameter location-scale distribution family have the same mean and variance?
I believe that the answer is no if we consider only smooth families (smooth: a small change in the parameters will result in a small change of the distribution/function/curve). But that answer is not so trivial and when we would use more general (non-smooth) families then we can say yes, although these families only exist in theory and have no practical relevance.
Generating a location-scale family from a single distribution by translation and scaling
From any particular single distribution we can generate a location-scale family by translation and scaling. If f(x) is the probability density function of the single distribution, then the probability density function for a member of the family will be
f(x;μ,σ)=1σf(x−μσ)
For a location-scale family that can be generated in such way we have:
- for any two members f(x;μ1,σ1) and f(x;μ2,σ2) if their means and variances are equal, then f(x;μ1,σ1)=f(x;μ2,σ2)
Can for all two parameter location-scale families their member distributions be generated from a single member distribution by translation and scaling?
So translation and scaling can convert a single distribution into a location-scale family. The question is whether the reverse is true and whether every two parameter location-scale family (where the parameters θ1 and θ2 do not necessarily need to coincide with the location μ and scale σ) can be described by a translation and scaling of a single member from that family.
For particular two parameter location-scale families like the family of normal distributions it is not too difficult to show that they can be generated according to the process above (scaling and translating of single example member).
One may wonder whether it is possible for every two parameter location-scale family to be generated out of a single member by translation and scaling. Or a conflicting statement: "Can a two parameter location-scale family contain two different member distributions with the same mean and variance?", for which it would be necessary that the family is a union of multiple subfamilies that are each generated by translation and scaling.
Case 1: Family of generalized Students' t-distributions, parameterized by two variables
A contrived example occurs when we make some mapping from R2 into R3 (cardinality-of-mathbbr-and-mathbbr2) which allows the freedom to use two parameters θ1 and θ2 to describe a union of multiple subfamilies that are generated by translation and scaling.
Let's use the (three parameter) generalized Student's t-distribution:
f(x;ν,μ,σ)=Γ(ν+12)Γ(ν2)πν√σ(1+1ν(x−μσ)2)−ν+12
with the three parameters changed as following
μσν===tan(θ1)θ2⌊0.5+θ1/π⌋
then we have
f(x;θ1,θ2)=Γ(⌊0.5+θ1/π⌋+12)Γ(⌊0.5+θ1/π⌋2)π⌊0.5+θ1/π⌋√θ2(1+1⌊0.5+θ1/π⌋(x−tan(θ1)θ2)2)−⌊0.5+θ1/π⌋+12
which may be considered a two parameter location-scale family (albeit not very useful) that can not be generated by translation and scaling of only a single member.
Case 2: Location-scale families generated by negative scaling of a single distribution with nonzero skew
A less contrived example, than using this tan-function, is given by Whuber under the comments of Carl's answer. We can have a family x↦f(x/b+a) where flipping the sign of b keeps the mean and variance unchanged but possibly changing the uneven higher moments. So this gives a bit more easily a two parameter location-scale family where members with the same mean and variance can have different higher order moments. This example from Whuber can be split into two subfamilies each of which can be generated out of a single member by translation and scaling.
Smooth families
If we try to make a single smooth two parameter distribution family (smooth: a small change in the parameters will result in a small change of the distribution/function/curve) by somehow making a composition of two or more families that are generated by translation and scaling, then we get into problems to have the two parameters cover both the variation of 'mean' and 'variance', as well as the third parameter 'shape'. A formal proof will have to go along the same lines as the answer to the question: Is there a smooth surjective function f:R2↦R3? (where the answer is no in the case of smooth, ie. infinitely differentiable, functions although there are continuous functions that would do the job such as Peano curves).
Intuition: Imagine there would be some parameters θ1, θ2 that describe the distributions in some location-scale distribution family and by which we can change the mean and variance as well as some other moments, then we should be able to express θ1, θ2, in terms of the mean μ and variance σ
θ1θ2==fθ1(μ,σ)fθ2(μ,σ)
but these need to be multiple valued functions and these can not make continuous transitions, the different values from fθ1(μ,σ) for a particular μ and σ are not continuous, and will not be able to model a continuous shape parameter.
I am actually not so sure about this final part. We could possibly use a space-filling curve (such as the Peano curve, if only we knew how to express coordinates on the curve to coordinates of the hypercube) to have a single parameter θ1 completely model multiple features like mean and variance, without giving up the property that a small change of the parameter θ1 is equivalent to a small change of the function f(x;θ1) at every x