Ich googelte, war autodidaktisch und suchte stundenlang nach einer Lösung, aber ohne Glück. Ich habe hier einige ähnliche Fragen gefunden, aber nicht diesen Fall.
Meine Tische:
- Personen (~ 10 Millionen Reihen)
- Attribute (Ort, Alter, ...)
- Verknüpfungen (M: M) zwischen Personen und Attributen (~ 40 Millionen Zeilen)
Situation:
Ich versuche, alle Personen-IDs ( person_id
) von einigen Orten ( location.attribute_value BETWEEN 3000 AND 7000
) auszuwählen , die ein Geschlecht haben ( gender.attribute_value = 1
), in einigen Jahren geboren sind ( bornyear.attribute_value BETWEEN 1980 AND 2000
) und eine Augenfarbe haben (eyecolor.attribute_value IN (2,3)
).
Dies ist meine Frage, die 3 ~ 4 min dauerte . und ich möchte optimieren:
SELECT person_id
FROM person
LEFT JOIN attribute location ON location.attribute_type_id = 1 AND location.person_id = person.person_id
LEFT JOIN attribute gender ON gender.attribute_type_id = 2 AND gender.person_id = person.person_id
LEFT JOIN attribute bornyear ON bornyear.attribute_type_id = 3 AND bornyear.person_id = person.person_id
LEFT JOIN attribute eyecolor ON eyecolor.attribute_type_id = 4 AND eyecolor.person_id = person.person_id
WHERE 1
AND location.attribute_value BETWEEN 3000 AND 7000
AND gender.attribute_value = 1
AND bornyear.attribute_value BETWEEN 1980 AND 2000
AND eyecolor.attribute_value IN (2,3)
LIMIT 100000;
Ergebnis:
+-----------+
| person_id |
+-----------+
| 233 |
| 605 |
| ... |
| 8702599 |
| 8703617 |
+-----------+
100000 rows in set (3 min 42.77 sec)
Erklären Sie erweitert:
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| 1 | SIMPLE | bornyear | range | attribute_type_id,attribute_value,person_id | attribute_value | 5 | NULL | 1265229 | 100.00 | Using where |
| 1 | SIMPLE | location | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | eyecolor | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | gender | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.eyecolor.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | person | eq_ref | PRIMARY | PRIMARY | 4 | test1.location.person_id | 1 | 100.00 | Using where; Using index |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.02 sec)
Profiling:
+------------------------------+-----------+
| Status | Duration |
+------------------------------+-----------+
| Sending data | 3.069452 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.968915 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.042468 |
| Waiting for query cache lock | 0.000043 |
| Sending data | 3.264984 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.823919 |
| Waiting for query cache lock | 0.000038 |
| Sending data | 2.863903 |
| Waiting for query cache lock | 0.000014 |
| Sending data | 2.971079 |
| Waiting for query cache lock | 0.000020 |
| Sending data | 3.053197 |
| Waiting for query cache lock | 0.000087 |
| Sending data | 3.099053 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 3.064186 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.939404 |
| Waiting for query cache lock | 0.000018 |
| Sending data | 3.440288 |
| Waiting for query cache lock | 0.000086 |
| Sending data | 3.115798 |
| Waiting for query cache lock | 0.000068 |
| Sending data | 3.075427 |
| Waiting for query cache lock | 0.000072 |
| Sending data | 3.658319 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.335427 |
| Waiting for query cache lock | 0.000049 |
| Sending data | 3.319430 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.496563 |
| Waiting for query cache lock | 0.000029 |
| Sending data | 3.017041 |
| Waiting for query cache lock | 0.000032 |
| Sending data | 3.132841 |
| Waiting for query cache lock | 0.000050 |
| Sending data | 2.901310 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.107269 |
| Waiting for query cache lock | 0.000062 |
| Sending data | 2.937373 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.097082 |
| Waiting for query cache lock | 0.000261 |
| Sending data | 3.026108 |
| Waiting for query cache lock | 0.000026 |
| Sending data | 3.089760 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 3.012763 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 3.069694 |
| Waiting for query cache lock | 0.000046 |
| Sending data | 3.591908 |
| Waiting for query cache lock | 0.000060 |
| Sending data | 3.526693 |
| Waiting for query cache lock | 0.000076 |
| Sending data | 3.772659 |
| Waiting for query cache lock | 0.000069 |
| Sending data | 3.346089 |
| Waiting for query cache lock | 0.000245 |
| Sending data | 3.300460 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.135361 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.909447 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 3.337561 |
| Waiting for query cache lock | 0.000140 |
| Sending data | 3.138180 |
| Waiting for query cache lock | 0.000090 |
| Sending data | 3.060687 |
| Waiting for query cache lock | 0.000085 |
| Sending data | 2.938677 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 2.977974 |
| Waiting for query cache lock | 0.000872 |
| Sending data | 2.918640 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 2.975842 |
| Waiting for query cache lock | 0.000051 |
| Sending data | 2.918988 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.943810 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.330211 |
| Waiting for query cache lock | 0.000025 |
| Sending data | 3.411236 |
| Waiting for query cache lock | 0.000023 |
| Sending data | 23.339035 |
| end | 0.000807 |
| query end | 0.000023 |
| closing tables | 0.000325 |
| freeing items | 0.001217 |
| logging slow query | 0.000007 |
| logging slow query | 0.000011 |
| cleaning up | 0.000104 |
+------------------------------+-----------+
100 rows in set (0.00 sec)
Tabellenstrukturen:
CREATE TABLE `attribute` (
`attribute_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`attribute_type_id` int(11) unsigned DEFAULT NULL,
`attribute_value` int(6) DEFAULT NULL,
`person_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`attribute_id`),
KEY `attribute_type_id` (`attribute_type_id`),
KEY `attribute_value` (`attribute_value`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=40000001 DEFAULT CHARSET=utf8;
CREATE TABLE `person` (
`person_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`person_name` text CHARACTER SET latin1,
PRIMARY KEY (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=20000001 DEFAULT CHARSET=utf8;
Die Abfrage wurde auf einem virtuellen DigitalOcean-Server mit SSD und 1 GB RAM durchgeführt.
Ich gehe davon aus, dass es Probleme mit dem Datenbankdesign geben könnte. Haben Sie Vorschläge, um diese Situation besser zu gestalten? Oder nur um die Auswahl oben anzupassen?
(attribute_type_id, attribute_value, person_id)
und (attribute_type_id, person_id, attribute_value)
attribute (person_id, attribute_type_id, attribute_value)