# Key idea:
            # Classification
            # (can be) based on
            # data values
          
          
            
              How it works
              
             
            
              
Classification is about putting things in buckets
              For one variable all we do is decide break points
              Think of this as segmenting a one-dimensional space in
              which polygons are located by numbers attached to them
            
          
          
            Add a variable
            
            
          
          
            This idea on a map
            Bivariate choropleths
            
            image from joshuastevens.net, see that page for more
          
          
            
              
‘Letting the data speak’
              Segmenting each variable separately not satisfactory
              Misses interdepencies among variables
              Better to use statistical clustering methods
              These work off differences and similarities among
              data points to find ‘natural’ clusters in data
            
          
          
          
            # Features of k-means
            ## It is non-deterministic
            ## You specify the number of classes (_k_) ahead of time
            ## It will _always_ find _k_ clusters
            ## Taken together: this means analyst interpretation is critical
          
          
            image source stackoverflow.com
            Alternatives to k-means
            Hierarchical clustering
            
            These are all unsupervised methods and primitive examples of machine-learning