updated the docs

--HG-- extra : convert_revision : svn%3Afdd8eb12-d10e-0410-9acb-85c331704f74/trunk%402803

updated the docs
--HG-- extra : convert_revision : svn%3Afdd8eb12-d10e-0410-9acb-85c331704f74/trunk%402803
7b5fd381 · Davis King · b0e277f5 · 7b5fd381
Commit 7b5fd381 authored Jan 13, 2009 by Davis King
Hide whitespace changes
Inline Side-by-side

Showing with 187 additions and 38 deletions

howto_contribute.xml docs/docs/howto_contribute.xml +187 -38

No files found.
--- a/docs/docs/howto_contribute.xml
+++ b/docs/docs/howto_contribute.xml
@@ -179,7 +179,7 @@
                  it is especially important in dlib since everything is viewable 
                  as pretty-printed HTML.  Tabs show up as 8 characters in most browsers
                  and this results in the HTML version being difficult to read.  So 
-                  don't use tabs.</p>
+                  don't use tabs.  Additionally, please use 4 spaces for each tab level.</p>
            </ul></li>
@@ -232,7 +232,7 @@
               <p>
                  RAII is a trivial technique to use.  All you have to do is not call new and delete yourself and
                  you will never have another memory leak.  Just use the appropriate <a href="containers.html">container</a>
-                  instead.  Finally, if you don't use RAII then your code is almost certainly not exception safe.  
+                  instead.  Finally, if you don't use RAII then your code is almost certainly not exception safe.   
               </p>
               </ul>
            </li>
@@ -291,10 +291,38 @@
               <ul><p>
                  A stack based array, or C style array, is an array declared like this:
                  <blockquote><tt>int array[200];</tt></blockquote>
-                  Most of my criticisms of pointers also apply to stack based arrays.  So you should 
+                  Most of my criticisms of pointers also apply to stack based arrays.  In particular, 
-                  use a container class instead and preferably one with the ability to do range
+                  if you are passing a stack based array to a function then that means you are probably
-                  checking such as the  <a href="containers.html#std_vector_c">std_vector_c</a>.   
+                  using functions similar to the unsafe compute_sum_of_array_elements() example above.
-               </p></ul>
+               </p>
+               <p>
+                  The only time it is OK to use this kind of array is when you use it for simple
+                  tasks and you don't start passing pointers to the array to other parts of your code.  You
+                  should also use a constant to store the array size and use that constant in your loops
+                  rather than hard coding the size in numerous places.   
+               </p>
+               <p>
+                  But even still, you should use a container class instead and preferably one with the ability to do range
+                  checking such as the  <a href="containers.html#std_vector_c">std_vector_c</a>.   </p>
+                  <p>
+                     Consider the following two bits of code:
+<pre>
+   for (int i = 0; i &lt; array_size; ++i) 
+      my_c_array[i] = 4;
+   for (int i = 0; i &lt; my_std_vector.size(); ++i)
+      my_std_vector[i] = 4;
+</pre>
+                  The second loop clearly doesn't overflow the bounds of the my_std_vector.   On the other 
+                  hand, just by looking at the code in the first loop, we can not tell if it overflows
+                  my_c_array.  We have to assume that array_size is the appropriate constant but we could be wrong.
+               </p>
+               <p>
+                  Buffer overflows are probably the most common kind of bug in C and C++ code.  These bugs also
+                  lead to serious exploitable security holes in software.  So please try to avoid stack based arrays.
+               </p>
+               </ul>
            </li>
@@ -302,32 +330,105 @@
        <!--   ****************************  -->
            <anchor>8</anchor>
            <li> <h3> Use exceptions, but don't abuse them. </h3>
-               <ul><p>
+               <ul>
-                  Exceptions are good but should only be used for <i>exceptional</i> conditions.
+                  <p>
-                  This means that in the vast majority of use cases a user shouldn't 
+                   Exceptions are one of the great features of modern programming languages.  Some 
-                  need to deal with the exceptions thrown by a library component near the point
+                   people, however, consider that to be a contentious statement.   But if you accept 
-                  of use.  If that isn't true then whatever condition is triggering your exception
+                   the notion that a software library should be hard to use wrong then it 
-                  isn't exceptional.  Or in other words, if the user would have to put try/catch
+                   becomes difficult to reject exceptions.  
-                  blocks around individual calls to your code then you are almost certainly using 
+                  </p>
-                  exceptions wrong.
+                  <p>
-               </p>
+                   Most of the complaints I hear about exceptions are actually complaints 
+                   about their <i>misuse</i> rather than objections to the basic idea.  
+                   So before I begin to defend the above
+                   paragraph I would like to lay out more clearly when it is appropriate to
+                   use exceptions and when it is not.   
+                  </p>
+                  <p>
+                  There are two basic questions you should ask yourself when deciding whether to 
+                  throw an exception in response to some event.  The first is (1) "should this event
+                  occur in the normal use of my library component?"  The second question is (2) "if this event
+                  were to occur, is it likely that the user will want to place the code for dealing 
+                  with the event near the invocations of my library component?"
+                  </p>
+                  <p>
+                     If your answers to the above two questions are "no" then you should probably
+                     throw an exception in response to the event.  On the other hand, if you answer
+                     "yes" to either of these questions then you should probably <i>not</i> throw an exception.
+                  </p>
               <p>
-                  A good example of an exceptional condition is running out of memory.  It doesn't happen
+                  A good example of an event worth throwing exceptions for is running out of memory.  
-                  very often, and when it does happen it is hardly ever the case that you want to
+                  (1) It doesn't happen very often, and (2) when it does happen it is hardly ever the case that 
-                  deal with the out of memory exception right next to the place where you are 
+                  you want to deal with the out of memory event right next to the place where you are 
                  attempting to allocate memory.  
               </p>
               <p>
-                  Another way of looking at it is that exceptions shouldn't occur in the normal use
+                  Alternatively, an example of an event that shouldn't throw an exception comes to 
-                  cases associated with a library component.  For example, the C++ I/O streams allow
+                  us from the C++ I/O streams.  This part of the standard library allows
-                  you to read the contents of a file on disk and when you hit the end of file they
+                  you to read the contents of a file from disk.  When you hit the end of file they
-                  do not throw an exception.   The difference between hitting EOF and running
+                  do not throw an exception.  This is appropriate because (1) you usually want to
-                  out of memory is that when everything is working properly your application will
+                  read a file in its entirety. So hitting EOF happens all the time.  Additionally, (2)
-                  routinely encounter ends of files but hopefully you do not routinely run out of memory.
+                  when you hit EOF you usually want to break out of the loop you are in
+                  and continue immediately into the next block of code.
+               </p>
+               <p>
+                  Usually when someone tells me they don't like exceptions they give reasons like "they make 
+                  me put try/catch blocks all over the place and it makes the code hard to read."  Or "it makes
+                  it hard to understand the flow of a program with exceptions in it."   Invariably they
+                  have been working with bodies of software that disregard the above rules regarding questions
+                  1 and 2.  Indeed, when exceptions are used for flow control the results are horrifying.  Using
+                  exceptions for events that occur in the normal use of a library component, especially when
+                  the events need to be dealt with near where they happen result in a spaghetti like mess
+                  of throw statements and try/catch blocks.  Clearly, exceptions should be used sparingly.  
+                  So please, take my advice regarding questions 1 and 2 to heart. 
+               </p>
+               <p>
+                  Now lets go back to my claim that exceptions are an important part of making
+                  a library that is hard to use wrong.  But first lets be honest about one thing,  
+                  many developers don't think very hard about error handing and they similarly aren't very
+                  careful about checking function return codes.  Moreover, even the most studious of
+                  us can easily forget to check error codes.  It is also easy to forget to add 
+                  appropriate exception catch blocks.
+               </p>
+               <p>
+                  So what is so great about exceptions then?  Well, lets imagine some error just occurred
+                  and it caused an exception to be thrown.   If you forgot to setup catch blocks to deal with
+                  the error then your program will be aborted.  Not exactly a great thing.  But you will, however,
+                  be able to easily find out what exception was thrown.  Additionally, exceptions typically contain an error
+                  message telling you all about the error that caused the exception to be thrown.  Moreover, 
+                  any debugger worth its
+                  salt will be able to show you a stack trace that lets you see exactly where the exception came from.
+                   The exception <i>forces</i> you, the user, to 
+                  be aware of this potential error and to add a catch block to deal with it. 
+                  This is where the "hard to use wrong" comes from. 
+               </p>
+               <p>
+                  Now lets imagine that we are using return codes to communicate errors to the user and the 
+                  same error occurs.  If you forgot to do all your return code checking then you will
+                  simply be unaware of the error.  Maybe your program will crash right away.  But more likely, it
+                  will continue to run for a while before crashing at some random place far away from the source
+                  of the error.  You and your debugger now get to spend a few hours of quality time 
+                  together trying to figure out what went wrong.  
+               </p>
+               <p>
+                  The above considerations are why I maintain that exceptions, used properly, contribute to 
+                  the "hard to use wrong" factor of a library.  There are however other reasons to use exceptions.
+                  They free the user from needing to clutter up code with lots of return code checking.  This makes
+                  code easier to read and lets you focus more on the algorithm you are trying to implement and less
+                  on the bookkeeping.  
+               </p>
+               <p>
+                  Finally, it is important to note that there is a place for return codes.  When you answer "no"
+                  to questions 1 and 2 I suggest using exceptions.  However, if you answer "yes" to even one
+                  of them then I would recommend pretty much anything other than throwing an exception.  In this
+                  case error codes are often an excellent idea.
               </p>
               <p>
                  As an aside, it is also important that your exception classes inherit from 
-                  <a href="other.html#error">dlib::error</a>.
+                  <a href="other.html#error">dlib::error</a> to maintain consistency with the rest of the library.
               </p>
               </ul>
            </li>
@@ -342,26 +443,74 @@
                         If you have been following the prohibition against messing around with
                         pointers then this won't even be an issue for you.  Moreover, just about the only
                         time this should even come up is when you are casting blocks of 
-                         memory into structs or dumping the contents of memory to an I/O channel.
+                         memory into other types or dumping the contents of memory to an I/O channel.
                         All of these things are highly non-portable so don't do them.
                        </p>
                        <p>
                           If you want a portable way to write the state of an object to an
-                           IO channel the I recommend you use the <a href="other.html#serialize">serialization</a>
+                           I/O channel then I recommend you use the <a href="other.html#serialize">serialization</a>
-                           capability in dlib.  If that still doesn't suit your needs then do 
+                           capability in dlib.  If that doesn't suit your needs then do 
-                           something else but whatever you do don't dump the contents of memory.  
+                           something else, but whatever you do don't just dump the contents of memory.  
-                           Convert your data into some portable format first.
+                           Convert your data into some portable format and then output that.
+                        </p>
+                        <p>
+                           As an example of something else you might do, suppose you have a bunch of integers 
+                           you want to write to disk.  Assuming all your integers are positive numbers representable 
+                           using 32 or fewer bits you could store all your numbers in 
+                           <a href="other.html#uint32">dlib::uint32</a> variables and then convert them
+                           into either big or little endian byte order and then write them to an output stream.  
+                           You could do this using code similar to the following:
+                           <pre>
+   dlib::<a href="other.html#byte_orderer">byte_orderer</a>::kernel_1a bo;
+   ...
+   bo.host_to_big(my_uint);
+   my_out_stream.write((char*)&amp;my_uint, sizeof(my_uint));
+   ...
+                           </pre>
+                           <p>
+                           There are three important things to understand about this process.  First, you need
+                           to pick variables that always have the same size on all platforms.  This means you
+                           can't use <i>any</i> of the built in C++ types like int, float, double, long, etc... All 
+                           of these types have different sizes depending on your platform and even compiler settings. 
+                           So you need to use something like dlib::uint32 to obtain a type of a known size.
+                           </p>
+                           <p>
+                           Second, you need to convert each thing you write out into either big or little endian byte order.  
+                           The reason for this is, again, portability.  If you don't explicitly convert to one
+                           of these byte orders then you end up writing data out using whatever the byte order
+                           is on your current machine.  If you do this then only machines that have the same
+                           byte order as yours will be able to read in your data.  If you use the dlib::byte_orderer
+                           object this is easy.  It is very type safe.  In fact, you should have a hard time even getting
+                           it to compile if you use it wrong.
+                           </p>
+                           <p>
+                           The third thing you should understand is that you need to write out each of your
+                           variables one at a time.  You can't write out an entire struct in a  
+                           single ostream.write() statement because the compiler is allowed to put any
+                           kind of padding it feels like between the fields in a struct.  
+                           </p>
+                           <p>
+                           You may be aware that compilers usually provide #pragma directives that allow you 
+                           to explicitly control this padding.  However, if you want to submit code to dlib 
+                           you will not use this feature.  Not all compilers support it in the same way and, 
+                           more importantly, not all CPU architectures are even capable of running code that 
+                           has had the padding messed with.  This is because it can result in the CPU attempting
+                           to perform what is called an "unaligned load" which many CPUs (like the SPARC) are
+                           incapable of doing.
+                           </p>
+                           <p>
+                              So in summary, convert your data into a known type with a fixed size, then convert
+                              into a specific byte order (like big endian), then write out each variable individually.
+                              Or you could just use <a href="other.html#serialize">serialize</a> and not worry about all
+                              this horrible stuff. :)
+                           </p>
                        </p>
                     </ul>
                  </li>
-                  <li> <b> Don't make assumptions about endianness  </b>
-                     <ul><p>
-                        This is self explanatory.  Some machines are little endian and some are big endian.  
-                        It is just a fact of life.  If you need to convert between the two then 
-                        please use the <a href="other.html#byte_orderer">byte_orderer</a> since it 
-                        can deal with these issues in a type safe way.  
-                     </p></ul>
-                  </li>
                  <li> <b> All code that calls functions that aren't in dlib or the C++
                     standard library must be isolated inside the API wrappers.</b>
                     <ul><p>