Ist es ein Problem, ein Programmierer zu sein, der keine Kenntnisse über die Komplexität von Berechnungen hat?

30

Ich habe eine Übung an meiner Universität bekommen. Ich habe es mit nach Hause genommen und versucht, einen Algorithmus zu programmieren, um es zu lösen. Es handelte sich vermutlich um Grafiken, um das Auffinden verbundener Komponenten.

Dann machte ich das Trivialste, was mir in den Sinn kam, und zeigte es dann meinem Dozenten. Nach einer kurzen Beobachtung stellte er fest, dass die Laufzeitkomplexität meiner Lösung unsicher war, und zeigte etwas Effizienteres. Und es gibt eine Tradition von Programmierern, die keine Ahnung haben, was Rechenkomplexität ist (ich war einer davon). Ist es also ein Problem, wenn ein Programmierer keine Ahnung hat, was Rechenkomplexität ist?

— Billy Rubina
quelle

3

Hinweis für Moderatoren : Bitte verwenden Sie keine Kommentare für längere Diskussionen oder um markige Antworten zu veröffentlichen. Sie können den Chatroom verwenden , um diese Frage zu diskutieren. vorherige Kommentare wurden dorthin verschoben.

— Gilles 'SO- hör auf böse zu sein'

4

Ihr Titel sagt Programmierer, aber Ihre Frage sagt Schüler. Im Allgemeinen impliziert "Programmierer" "professioneller Programmierer" - fragen Sie sich also, ob es ein Problem ist, ein professioneller Programmierer ohne Kenntnis der Komplexität der Berechnungen zu sein? Oder ob es für einen Programmierstudenten in Ordnung ist, dieses Wissen nicht zu haben? Die beiden sind unterschiedliche Fragen, auch wenn sich herausstellt, dass sie die gleiche Antwort haben.

— corsiKa

42

Ja, ich würde sagen, dass es für jeden ernsthaften Programmierer ein Muss ist, etwas über die Komplexität von Berechnungen zu wissen. Solange Sie nicht mit großen Datenmengen zu tun haben, werden Sie die Komplexität nicht kennen, aber wenn Sie ein Programm schreiben möchten, das schwerwiegende Probleme angeht, brauchen Sie es.

In Ihrem speziellen Fall hat sich Ihr Beispiel für die Suche nach verbundenen Komponenten möglicherweise für Diagramme mit bis zu Knoten bewährt. Wenn Sie jedoch einen Graphen mit Knoten probiert hätten, hätte der Algorithmus Ihres Dozenten dies wahrscheinlich in 1 Sekunde geschafft, während Ihr Algorithmus (abhängig von der Komplexität) 1 Stunde, 1 Tag oder vielleicht sogar 1 Ewigkeit gedauert hätte. $100$ $100.000$

Ein häufiger Fehler, den die Teilnehmer in unserem Algorithmus-Kurs machen, besteht darin, ein Array wie das folgende zu durchlaufen:

while array not empty
    examine first element of array
    remove first element from array

Dies ist möglicherweise nicht der schönste Code, aber in einem komplizierten Programm wird möglicherweise so etwas angezeigt, ohne dass sich der Programmierer dessen bewusst ist. Was ist nun das Problem mit diesem Programm?

Angenommen, wir führen es mit einem Datensatz von Elementen aus. Verglichen mit dem folgenden Programm wird das frühere Programm langsamer ausgeführt. $100.000$ $50.000$

while array not empty
    examine last element of array
    remove last element from array

Ich hoffe, Sie stimmen zu, dass das Wissen, Ihr Programm mal schneller laufen zu lassen , für einen Programmierer wahrscheinlich wichtig ist. Um den Unterschied zwischen den beiden Programmen zu verstehen, sind einige Grundkenntnisse in der Komplexitätstheorie und einige Kenntnisse über die Einzelheiten der Sprache, in der Sie programmieren, erforderlich. $50.000$

In meiner Pseudocodesprache verschiebt "Entfernen eines Elements aus einem Array" alle Elemente rechts von dem zu entfernenden Element um eine Position von links. Dies macht das Entfernen des letzten Elements zu einer -Operation, da wir dazu nur mit 1 Element interagieren müssen. Das Entfernen des ersten Elements ist da wir zum Entfernen des ersten Elements alle anderen Elemente ebenfalls um eine Position nach links verschieben müssen. $O(1)$ $O(n)$ $n-1$

Eine sehr grundlegende Übung in Bezug auf die Komplexität besteht darin, zu beweisen, dass das erste Programm erfolgreich ist Operationen, während das zweite Programm nurOperationen verwendet. Wenn Sieeinstecken, werdenSie feststellen, dass ein Programm drastisch effizienter ist als das andere. $\frac{1}{2}n^2$ $n$ $n=100.000$

Dies ist nur ein Spielzeugbeispiel, aber es erfordert bereits ein grundlegendes Verständnis der Komplexität, um den Unterschied zwischen den beiden Programmen zu erkennen. Wenn Sie tatsächlich versuchen, ein komplizierteres Programm zu debuggen / optimieren, das diesen Fehler aufweist, ist ein noch besseres Verständnis erforderlich raus wo der Bug ist. Denn ein Fehler wie das Entfernen eines Elements aus einem Array auf diese Weise kann durch Abstraktionen im Code sehr gut ausgeblendet werden.

Ein gutes Verständnis der Komplexität hilft auch beim Vergleich zweier Lösungsansätze. Angenommen, Sie hätten zwei verschiedene Ansätze gefunden, um das Problem der verbundenen Komponenten selbst zu lösen: Um sich zwischen ihnen zu entscheiden, wäre es sehr nützlich, wenn Sie die Komplexität (schnell) abschätzen und die bessere auswählen könnten.

— Tom van der Zanden
quelle

10

"So long as you are not dealing with huge data sets you will be fine not knowing complexity"Dies ist oft wahr, aber nicht immer so. Beispielsweise ist ein O(n!)Algorithmus selbst für relativ kleine Datensätze nicht brauchbar. Wenn Sie einen O(n!)Algorithmus verwenden, bei dem Sie O(n^2)Ihr Programm hätten verwenden können, dauert die Ausführung bei einer Datengröße von 10 36.288-mal länger . Bei einer Datengröße von 20 betrachten Sie 2,4 Billionen Operationen.

— Reirab

1

Ich denke, das Beispiel von @ reirab sollte in die Antwort aufgenommen werden. Es ist dramatischer und beweist Ihren Standpunkt entschiedener. Und ich persönlich bin von solchen Algorithmen gebissen worden, bevor ich rechnerische Komplexität gelernt habe.

— Siyuan Ren

2

Ich denke, es gibt ein größeres Problem. Wenn Sie einfach nicht wissen, dass Sie sich selbst für Aufgaben entscheiden, bei denen dies nicht erforderlich ist. Sie können also sagen, dass fast alle Fragen, mit denen X endet, nützlich sein könnten. Egal, ob es kritisch ist oder nicht.

— Joojaa

"Um den Unterschied zwischen den beiden Programmen zu verstehen, sind einige Grundkenntnisse über die Komplexitätstheorie erforderlich" - ich denke, für dieses Beispiel ist dies nicht der Fall. Sie könnten es profilieren, beobachten, dass die gesamte Zeit in "Element entfernen" verbracht wird, wissen (ohne die Komplexitätstheorie zu verstehen), dass das Entfernen des letzten Elements schneller ist als das Entfernen des ersten, nehmen Sie die Änderung vor und beschleunigen Sie daher das Programm. Der Vorteil des Verständnisses der Komplexitätstheorie besteht darin, dass Sie solche Probleme lose quantifizieren können, ohne sie zu profilieren, sodass Sie sie "vorzeitig" optimieren können.

— Steve Jessop

.. und im Allgemeinen vermute ich, dass alle oder fast alle praktischen Beispiele einzeln ohne Bezug auf die Komplexitätstheorie gelöst werden können. In diesem Fall ist das Wissen, dass das Kopieren vieler Daten langsamer ist als das Nicht-Kopieren, keine „Komplexitätstheorie“. Aber natürlich ist es für die Programmierung (und für jeden Beruf) immer noch nützlich, ein gutes mentales Modell der Prinzipien zu haben, die häufig auftauchen, da Sie solche Probleme routinemäßig und nicht nacheinander mit Ad-hoc-Mitteln analysieren, diskutieren und lösen können.

— Steve Jessop

26

Dies ist eine Widerlegung von Tom van der Zandens Antwort , die besagt, dass dies ein Muss ist.

Die Sache ist in den meisten Fällen 50.000-mal langsamer ist nicht relevant (es sei denn, Sie arbeiten natürlich bei Google).

Wenn die Operation, die Sie ausführen, eine Mikrosekunde dauert oder wenn Ihr N nie über einem bestimmten Schwellenwert liegt (ein hoher Anteil der heutzutage ausgeführten Codierung), ist dies NIEMALS von Bedeutung. In diesen Fällen verschwenden Sie nur Zeit (und höchstwahrscheinlich Geld), wenn Sie über die Komplexität der Berechnungen nachdenken.

Die Komplexität von Rechnern ist ein Werkzeug, um zu verstehen, warum etwas langsam ist oder schlecht skaliert werden kann und wie man es verbessern kann, aber die meiste Zeit ist es ein völliger Overkill.

Ich bin seit mehr als fünf Jahren ein professioneller Programmierer und habe nie die Notwendigkeit gefunden, über Rechenaufwand nachzudenken, wenn ich in einer Schleife O (M * N) schleife, weil die Operation immer sehr schnell ist oder M und N so sind klein.

There are far more important, generally used, and harder things to understand for anyone doing programming jobs (threading and profiling are good examples in the performance area).

Of course, there are some things that you will never be able to do without understanding computational complexity (for example: finding anagrams on a dictionary), but most of the time you don't need it.

— claudio495h
quelle

3

To expand on your point, there are cases where too much emphasis on computational complexity can lead you astray. For example, there may be situations where "better" algorithm is actually slower for small inputs. The profiler is the ultimate source of truth.

— Kevin Krumwiede

2

@Kevin Krumwiede, I completely agree with you that optimizing a sort for a trivial data set is overkill. But it also illustrates that having at least an understanding of complexity is still important. The understanding is what will lead you to make the decision that a bubble sort is appropriate as opposed to some other, more complex, algorithm.

— Kent A.

4

When you know the data set is small in all cases you can get away with this sort of thing. You have to be very careful of excess complexity in stuff called within loops, though--not long ago I cut a minute runtime to a second this way. I've also encountered a O(n^8) problem once (data validation.) Lots of care got it down to 12 hours.

— Loren Pechtel

7

I've never found the need to think about computational complexity when looping inside a loop O(M * N) because always the operation is really fast or M and N are so small. – Ironically, the argument you give shows that you did think about computational complexity. You decided that it’s not a relevant issue for what you are doing and possibly rightfully so, but you are still aware of the existence of this issue, and if it would ever pose a problem, you could react to it before serious consequences happen on the user level.

— Wrzlprmft

4

Premature optimization is the root of all evil, but premature pessimization is the root of at least a good deal of annoyed users. You may not need to be able to solve a recurrence relation, but if you are, at the very least, not capable of telling the difference between O(1), O(N) and O(N^2), especially when you're nesting loops, someone is going to have to clean up the mess later. Source: the messes I had to clean up later. A factor 50.000 is so big that you had better know if you can still afford that later, when your inputs have grown.

— Jeroen Mostert

14

I've been developing software for about thirty years, working both as a contractor and employee, and I've been pretty successful at it. My first language was BASIC, but I quickly taught myself machine language to get decent speed out of my underpowered box. I have spent a lot of time in profilers over the years and have learned a lot about producing fast, memory efficient optimized code.

Regardless to say, I'm self taught. I never encountered the O notation until I started interviewing a few years ago. It's never come up in my professional work EXCEPT during interviews. So I've had to learn the basics just to handle that question in interviews.

I feel like the jazz musician who can't read sheet music. I can still play just fine. I know about hashtables (heck, I invented hashtables before I learned that they had already been invented) and other important data structures, and I might even know some tricks that they don't teach in school. But I think the truth is that if you want to succeed in this profession, you will either need to go indie or learn the answers to the questions that they will ask during interviews.

Incidentally, I most recently interviewed for a front end web developer role. They asked me a question where the answer required both a knowledge of computational complexity and logarithms. I managed to remember enough math from twenty years ago to answer it more or less correctly, but it was a bit jarring. I've never had to use logarithms in any front end development.

Good luck to you!

— Scott Schafer
quelle

2

So, your answer is "yes"?

— Raphael

6

TL;DR: "yes". However, in my experience you're not going to be talking about computational complexity in most jobs after you're hired. Yes, know your data structures and their performance, but just knowing that an algorithm is O(n) or whatever does not a good programmer make. It's much better to focus on writing good code quickly and then optimizing the hot spots later. Readability and maintainability are usually more important for most code than performance.

— Scott Schafer

3

I think it may happen that complexity comes up in a corporate setting, but the first real concern for companies is shipping: if it works, it's good enough, until there's available budget to improve the app, or a customer comes back to complain about poor performances. In b2b situations for adhoc projects, it's probably quite uncommon. In b2c, or in highly competitive markets (off the shelf products), it would probably come up more often, with the direct effect of raising the entry bar for new hires.

— didierc

4

@didierc "Good enough" is also what breaks things all the time.

— Raphael

1

@didierc 1) Well, people with solid backgrounds in CS do (hopefully) have a good intuition for what is correct and what is not, whereas ad-hoc problem solvers may commit "simple" mistakes. Ensuring that the execution after multiplie compilations is exactly what was specifid is highly non-trivial, and afaik an unsolved problem. 2) No.

— Raphael

9

The question is quite subjective, so I think the answer is it depends.

It doesn't matter that much if you work with small amounts of data. In these cases, it is usually fine to use whatever e.g. the standard library of your language offers.

However, when you deal with large amounts of data, or for some other reason you insist that your program is fast, then you must understand computational complexity. If you don't, how do you know how a problem should be solved, or how quickly it is even possible to solve it? But understanding just theory is not enough to be a really good programmer. To produce extremely fast code, I believe, you also have to understand how e.g. your machine works (caches, memory layout, the instruction set), and what your compiler does (compilers do their best, but are not perfect).

In short, I think understanding complexity clearly makes you a better programmer.

— Juho
quelle

1

I think you generally have right idea, but "subjective" doesn't describe this issue adequately; "circumstantial" would be a better word. Also, one can however write very slow programs that don't operate on a lot of data. I recently answered a question on math.se about polynomial representation/storage. That usually involves a pretty small amount of data e.g. ~1000-term polynomials are typical; yet there are huge real-world differences in performance (hundreds or thousands of seconds vs. a few seconds for a multiplication) depending on the implementation.

— Fizz

4

It is certainly a problem if someone who is developing significant algorithms does not understand algorithm complexity. Users of an algorithm generally rely on a good quality of implementation that has good performance characteristics. While complexity is not the only contributor to performance characteristics of an algorithm, it is a significant one. Someone who does not understand algorithm complexity is less likely to develop algorithms with useful performance characteristics.

It is less of a problem for users of an algorithm, assuming the algorithms available are of good quality. This is true for developers who use languages that have a significant, well-specified, standard library - they just need to know how to pick an algorithm that meets there needs. The problem comes in where their are multiple algorithms of some type (say, sorting) available within a library, because complexity is often one of the criteria for picking between. A developer who does not understand complexity then cannot understand the basis for picking an effective algorithm for their task at hand.

Then there are developers who focus on (for want of a better description) non-algorithmic concerns. For example, they may focus on developing intuitive user interfaces. Such developers will often not need to worry about algorithm complexity although, again, they may rely on libraries or other code being developed to a high quality.

— Rob
quelle

3

It depends, but not on amount of data you're working with, but on kind of work you do, programs you develop.

Let's call programmer that doesn't know about conceptual complexity noobish programmer.

The noobish programmer can do:

develop big data databases - he doesn't have to know how it works inside, all he has to know are rules about developing databases. He knows things like: what should be indexed,... where it is better to make redundancy in data, where it is not...
make games - he just has to study how some game engine works and follow its paradigms, games and computer graphics are quite a big data problems. Consider 1920*1080*32bit = cca 7.9MB for single picture/frame... @60 FPS it's at least 475MB/s. Consider, that just one unnecessary copy of fullscreen picture would waste around 500MB memory throughput per second. But, he doesn't need to care about that, because he only uses engine!

The noobish programmer shouldn't do:

develop very frequently used complex programs no matter of size of data it's working with,... for example, small data won't cause noticeable impact of improper solution during development, because it will be slower than compilation time, etc. So, 0.5sec for one simple program ain't that much from noobish programmer perspective, Well, consider server server, that runs this program twenty times per second. It would require 10cores to be able to sustain that load!
develop programs for embedded devices. Embedded devices work with small data, but they need to be as efficient as it's possible, because redundant operations make unnecessary power consuption

So, noobish programmer is fine, when you want just use technologies. So, when it comes to development of new solutions, custom technologies, etc. Then it's better to hire not noobish programmer.

However, if company doesn't develop new technologies, just uses already made ones. It would be waste of talent to hire skilled and talented programmer. The same applies, if you don't want to work on new technologies and you're fine putting customers ideas into designs and programs using already made frameworks, then it's waste of your time, to learn something you won't ever need, except if it's your hobby and you like logical challenges.

— kravemir
quelle

1

This answer could be improved if it used a more neutral label, or no label at all, much like the other reply that used the term "incompetent programmer."

— Moby Disk

1

I'm not sure what you mean by "conceptual complexity". My experience is that people who don't know enough about trees or hashtables can't make intelligent decisions regarding how to index (parts of) a big database.

— Fizz

3

I'm somewhat hesitant to write an answer here but since I found myself nitpicking on several others' [some of my comments got moved to chat], here's how I see it...

There are levels/degrees of knowledge to a lot of things in computing (and by this term I mean roughly the union of computer science with information technology). Computation complexity surely is a vast field (Do you know what OptP is? Or what the Abiteboul-Vianu theorem says?) and also admits a lot of depth: most people with a CS degree can't produce the expert proofs that go into research publications in computational complexity.

The level of knowledge and skill/competence required in such matters depends a lot on what one works on. Completely clueless O( $n^2$ ) sorting is sometimes said to be a major cause of slow programs^{[citation needed]}, but a 2003 SIGCSE paper noted "Insertion sort is used to sort small (sub) arrays in standard Java and C++ libraries." On the flip side, premature optimization coming from someone who doesn't understand what asymptotic means (computational complexity being such a measure) is sometimes a problem in programming practice. However, the point of knowing at least when computational complexity matters is why you need to have some clue about it, at least at an undergraduate level.

I would honestly dare compare the situation of knowing when to apply computational complexity concepts (and knowing when you can safely ignore them) with the somewhat common practice (outside of Java world) of implementing some performance-sensitive code in C and the performance-insensitive stuff in Python etc. (As an aside, this was called in a Julia talk the "standard compromise".) Knowing when you don't have to think about performance saves you programming time, which is a fairly valuable commodity too.

And one more point is that knowing computational complexity won't automatically make you good at optimizing programs; you need to understand more architecture-related stuff like cache locality, [sometimes] pipelining, and nowadays parallel/multi-core programming too; the latter has both its own complexity theory and practical considerations as well; a taste of the latter from a 2013 SOSP paper "Every locking scheme has its fifteen minutes of fame. None of the nine locking schemes we consider consistently outperforms any other one, on all target architectures or workloads. Strictly speaking, to seek optimality, a lock algorithm should thus be selected based on the hardware platform and the expected workload."

— Fizz
quelle

1

In the long run, developing or finding a better algorithm is usually more beneficial than changing programming language for the performance-sensitive bits. I agree with you that there is a strong association between lack of understanding of complexity and premature optimisation - because they usually target the less performance-sensitive bits for optimisation.

— Rob

1

In practice, (inadvertent) Schlemiel the Painter's algorithms are much more frequent than O(n^2) sorting.

— Peter Mortensen

-1

If you don't know big-O you should learn it. It's not hard, and it's really useful. Start with searching and sorting.

I do notice that a lot of answers and comments recommend profiling, and they almost always mean use a profiling tool.

The trouble is, profiling tools are all over the map in terms of how effective they are for finding what you need to speed up. Here I've listed and explained the misconceptions that profilers suffer from.

The result is that programs, if they are larger than an academic exercise, can contain sleeping giants, that even the best automatic profiler cannot expose. This post shows a few examples of how performance problems can hide from profilers.

But they cannot hide from this technique.

— Mike Dunlavey
quelle

You claim "Big-Oh" is useful but then you advocate a different approach. Also, I don't see how learning "Big-Oh" (mathematics) can "start with searching and sorting" (algorithmis problems).

— Raphael

@Raphael: I do not advocate a different approach - it's orthogonal.Big-O is basic knowledge for understanding algorithms, whereas finding performance problems in non-toy software is something you do after the code is written and run, not before. (Sometimes academics don't know this, so they continue teaching gprof, doing more harm than good.) In so doing, you may or may not find that the problem is use of a O(n*n) algorithm, so you should be able to recognize that. (And big-O is just a mathematically defined property of algorithms, not a different subject.)

— Mike Dunlavey

"And big-O is just a mathematically defined property of algorithms, not a different subject." -- that's wrong, and dangerously so. "Big-Oh" defines classes of functions; per se, it has nothing to do with algorithms at all.

— Raphael

Let us continue this discussion in chat.

— Raphael