O'Reilly & Associates, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 998-9938 (U.S. and Canada)
(707) 827-7000 (international/local)
(707) 829-0104 (fax)
You can also contact O'Reilly by email. To be put on the mailing list or request a catalog, send a message to:
[email protected]
We have a web page for this book, which lists errata, examples, and any additional information. You can access this page at:
http://www.oreilly.com/catalog/ldapsa/
To comment or ask technical questions about this book, send email to:
[email protected]
For more information about O'Reilly books, conferences, Resource Centers, and the O'Reilly Network, see the O'Reilly web site at:
http://www.oreilly.com/
Acknowledgments
At the end of every project, I am acutely aware that I could never have reached the end without the grace provided to me by God through my Savior, Jesus Christ. I hope He is proud of how I have spent my time. I am also very conscious of the patience bestowed upon me by my wife, Kristi, who is always there to listen when I need to talk and laugh when I need a smile. Thank you.
There is a long list of people who have helped make this book possible. I do not claim that this is a complete list. Mike Loukides has shown almost as much patience as my wife waiting on this book to be completed. I am in great debt to the technical reviewers who each provided comments on some version of this manuscript: Robbie Allen, David Blank-Edelman, Æleen Frisch, Robert Haskins, Luke Howard, Scott McDaniel, and Kurt Zeilenga. Thanks to Æleen for convincing me to do this (even if I complained more than once). I must also mention the various coffee shops, particularly the Books-A-Million in Auburn, AL, that have allowed me to consume far more than my fair share of caffeine and electricity.
Finally, a huge amount of recognition must be given to the developers who made various pieces of software available under open source and free software licenses. It is such an enjoyable experience to be able to send and receive feedback on problems, bugs, and solutions. Any other way would just be too painful.
Part I. LDAP Basics
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 1. "Now where did I put that...?", or "What is a directory?"
I have a fairly good memory for numbers, phone numbers in particular. This fact amazes my wife. For those numbers I cannot recall to the exact digit, I have a dozen or so slots in my cell phone. However, as the company I worked for grew, so did the list of people with whom I needed to stay in contact. And I didn't just need phone numbers; I needed email and postal addresses as well. My cell phone's limited capabilities were no longer adequate for maintaining the necessary information.
So I eventually broke down and purchased a PDA. I was then able to store contact information for thousands of people. Still, two or three times a day I found myself searching the company's contact database for someone's number or address. And I still had to go to other databases (phone books, corporate client lists, and so on) when I needed to look up someone who worked for a different company.
Computer systems have exactly the same problem as humans—both require the capability to locate certain types of information easily, efficiently, and quickly. During the early days of the ARPAnet, a listing of the small community of hosts could be maintained by a central authority—SRI's Network Information Center (NIC). As TCP/IP became more widespread and more hosts were added to the ARPAnet, maintaining a centralized list of hosts became a pipe dream. New hosts were added to the network before everyone had even received the last, now outdated, copy of the famous HOSTS.TXT file. The only solution was to distribute the management of the host namespace. Thus began the Domain Name System (DNS), one of the most successful directory services ever implemented on the Internet.[1]
DNS is a good starting point for our overview of directory services. The global DNS shares many characteristics with a directory service. While directory services can take on many different forms, the following five characteristics hold true (at a minimum):
A directory service is highly optimized for reads. While this is not a restriction on the DNS model, for performance reasons many DNS servers cache the entire zone information in memory. Adding, modifying, or deleting an entry forces the server to reparse the zone files. Obviously, this is much more expensive than a simple DNS query.
A directory service implements a distributed model for storing information. DNS is managed by thousands of local administrators and is connected by root name servers managed by the InterNIC.
A directory service can extend the types of information it stores. Recent RFCs, such as RFC 2782, have extended the types of DNS records to include such things as server resource records (RRs).
A directory service has advanced search capabilities. DNS supports searches by any implemented record type (e.g., NS, MX, A, etc.).
A directory service has loosely consistent replication among directory servers. All popular DNS software packages support secondary DNS servers via periodic "zone transfers" that contain the latest copy of the DNS zone information.
The Lightweight Directory Access Protocol
Of course, you didn't buy this book to read about the Domain Name System. And it's not likely that you were looking for a general discussion of directory services. This book is about a particular kind of directory service—namely, a service for directories that implement the Lightweight Directory Access Protocol (LDAP). LDAP has become somewhat of a buzzword in contemporary IT shops. If you are like me, sometimes you just have to ask, "Why all the fuss?" The fuss is not so much about LDAP itself, but about the potential of LDAP to consolidate existing services into a single directory that can be accessed by LDAP clients from various vendors. These clients can be web browsers, email clients, mail servers, or any one of a myriad of other applications.
By consolidating information into a single directory, you are not simply pouring the contents of your multitude of smaller pots into a larger pot. By organizing your information well and thinking carefully about the common information needed by client applications, you can reduce data redundancy in your directories and therefore reduce the administrative overhead needed to maintain that data. Think about all the directory services that run on your network and consider how much information is duplicated. Perhaps hosts on your network use a DHCP server. This server has a certain amount of information about IP addresses, Ethernet addresses, hostnames, network topology, and so forth in its configuration files. Which other applications use the same or similar information and could share it if it were stored in a directory server? DNS comes immediately to mind, as does NIS. If you have networked printers as well, think about the amount of information that's replicated on each client of the printing system (for example, /etc/printcap files).
Now consider the applications that use your user account information. The first ones that probably come to mind are authentication services: users need to type usernames and passwords to log in. Your mail server probably uses the same username information for mail routing, as well as for services such as mailing lists. There may also be online phone books that keep track of names, addresses, and phone numbers, as well as personnel systems that keep track of job classifications and pay scales.
Imagine the administrative savings that would result if all the redundant data on your network could be consolidated in a single location. What would it take to delete a user account? We all know what that takes now: you delete the user from /etc/passwd, remove him by hand from any mailing lists, remove him from the company phone list, and so on. If you're clever, you've probably written a script or two to automate the process, but you're still manipulating the same information that's stored in several different places. What if there was a single directory that was the repository for all this information, and deleti
ng a user was simply a matter of removing some records from this directory? Life would become much simpler. Likewise, what would it take to track host-related information? What would it be worth to you if you could minimize the possibility that machines and users use out-of-date information?
This sounds like a network administrator's utopia. However, I believe that as more and more client applications use LDAP directories, making an investment in setting up an LDAP server will have a huge payoff long-term. Realistically, we're not headed for a utopia. We're going to be responsible for more servers and more services, running on more platforms. The dividends of our LDAP investment come when we significantly reduce the number of directory technologies that we have to understand and administer. That is our goal.
* * *
[1] For more information on the Domain Name System and its roots, see DNS and BIND, by Paul Albitz and Cricket Liu (O'Reilly).
What Is LDAP?
The best place to begin when explaining LDAP is to examine how it got its name. Let's start at the beginning. The latest incarnation of LDAP (Version 3) is defined in a set of nine documents outlined in RFC 3377. This list includes:
RFC 2251-2256
The original core set of LDAPv3 RFCs
RFC 2829
"Authentication Methods for LDAP"
RFC 2830
"Lightweight Directory Access Protocol (v3): Extension for Transport Layer Security"
RFC 3377
"Lightweight Directory Access Protocol (v3): Technical Specification"
Lightweight
Why is LDAP considered lightweight? Lightweight compared to what? (As we look at LDAP in more detail, you'll certainly be asking how something this complex could ever be considered lightweight.) To answer these questions, it is necessary to look at LDAP's origins. The roots of LDAP are closely tied to the X.500 directory service; LDAP was originally designed as a lighter desktop protocol used to gateway requests to X.500 servers. X.500 is actually a set of standards; anything approaching thorough coverage of X.500 is beyond the scope of this book.[2]
X.500 earned the title "heavyweight." It required the client and server to communicate using the Open Systems Interface (OSI) protocol stack. This seven-layered stack was a good academic exercise in designing a network protocol suite, but when compared to the TCP/IP protocol suite, it is akin to traveling the European train system with four fully loaded footlockers.[3]
LDAP is lightweight in comparison because it uses low overhead messages that are mapped directly onto the TCP layer (port 389 is the default) of the TCP/IP protocol stack.[4] Because X.500 was an application layer protocol (in terms of the OSI model), it carried far more baggage, as network headers were wrapped around the packet at each layer before it was finally transmitted on the network (see Figure 1-1).
Figure 1-1. X.500 over OSI versus LDAP over TCP/IP
LDAP is also considered lightweight because it omits many X.500 operations that are rarely used. LDAPv3 has only nine core operations and provides a simpler model for programmers and administrators. Providing a smaller and simpler set of operations allows developers to focus on the semantics of their programs without having to understand rarely used features of the protocol. In this way, LDAP designers hoped to increase adoption by providing easier application development.
Directory
Network directory services are nothing new; we're all familiar with the rise of DNS. However, a directory service is often confused with a database. It is easy to understand why. Directory services and databases share a number of important characteristics, such as fast searches and an extendable schema. They differ in that a directory is designed to be read much more than it is written; in contrast, a database assumes that read and write operations occur with roughly the same frequency. The assumption that a directory is read often but written rarely means that certain features that are essential to a database, such as support for transactions and write locks, are not essential for a directory service such as LDAP.
At this point, it's important to make the distinction between LDAP and the backend used to store the persistent data. Remember that LDAP is just a protocol; we'll discuss what that means shortly, but essentially, it's a set of messages for accessing certain kinds of data. The protocol doesn't say anything about where the data is stored. A software vendor implementing an LDAP server is free to use whatever backend it desires, ranging from flat text files on one extreme to highly scalable, indexed relational databases on the other. So when I say that LDAP doesn't have support for transactions and other features of databases, I mean that the protocol doesn't have the messages that you would need to take advantage of these features (remember, it's lightweight) and doesn't require that the backend data store provide these features.
The point is that the client will never (and should never) see or even know about the backend storage mechanism (see Figure 1-2). For this reason, LDAP-compliant clients written by vendor A should interoperate with an LDAP-compliant server written by vendor Z. Standards can be a wonderful thing when followed.
Figure 1-2. Relationship between an LDAP client, LDAP server, and data storage facility
It has been suggested that an LDAP server could be used as backend storage for a web server. All HTML and graphic files would be stored within the directory and could be queried by mutiple web servers. After all, a web server typically only reads files and sends them to clients; the files themselves change infrequently. While it's certainly possible to implement a web server that uses LDAP to access its backend storage, a special type of directory already exists that is better suited to meet the needs of serving files, namely a filesystem. So, for example, while an LDAP directory might not be a good location for storing spooled files in transit to a printer, using it to store printer configuration settings (e.g., /etc/printcap) shared among clients would be a big win.
This brings up two good points about the intended function of LDAP:
LDAP is not a generalized replacement for specialized directories such as filesystems or DNS.
While storing certain types of binary information (e.g., JPEG photos) in directories can be useful, LDAP is not intended for storing arbitrary "blobs" (Binary Lumps of Bits).
What about storing individual application settings for roaming users on an LDAP server? It is a judgment call whether this is better served by a filesystem or a directory. For example, it is possible to store basic application settings for Netscape Communicator in LDAP. Such things as an address book, a bookmarks file, and personal preference settings are certainly appropriate for storage in a directory. However, using your directory as a location for browser cache files would violate rule #2.
Access Protocol
All of this talk of directory services makes it is easy to forget that LDAP is a protocol. It is not uncommon to hear someone refer to an LDAP server or LDAP tree. I have done so and will continue to do so. LDAP does provide a treelike view of data, and it is this treelike view to which people refer when speaking of an LDAP server.
This introduction won't go into the specifics of the actual protocol. It is enough to think of LDAP as the message-based, client/server protocol defined in RFC 2251. LDAP is asynchronous (although many development kits provide both blocking and nonblocking APIs), meaning that a client may issue multiple requests and that responses to those requests may arrive in an order different from that in which they were issued. Notice in Figure 1-3 that the client sends Requests 1 and 2 prior to receiving a response, and the response to Request 3 is returned before the response to Request 2.
Figure 1-3. LDAP requests and responses
More aspects of programming with LDAP operations will be covered in Chapter 10.
* * *
[2] Understanding X.500—The Directory, by David W. Chadwick, provides a good explanation of X.500 directories. While the book itself is out of print, an HTML version of it can be accessed from http://www.salford.ac.uk/its024/X500.htm.
[3] For a quick, general comparison of the OSI model and th
e TCP/IP protocol stack, see Computer Networks, by Andrew S. Tanenbaum (Prentice Hall).
[4] A connectionless version of LDAP that provided access via UDP was defined by an Internet-Draft produced by the LDAP Extension Working Group of the IETF. However, the current draft expired in November, 2001. You can access the group's web site at http://www.ietf.org/html.charters/ldapext-charter.html.
LDAP Models
LDAP models represent the services provided by a server, as seen by a client. They are abstract models that describe the various facets of an LDAP directory. RFC 2251 divides an LDAP directory into two components: the protocol model and the data model. However, in Understanding and Deploying LDAP Directory Services, by Timothy A. Howes, Mark C. Smith, and Gordon S. Good (MacMillan), four models are defined:
Information model
The information model provides the structures and data types necessary for building an LDAP directory tree. An entry is the basic unit in an LDAP directory. You can visualize an entry as either an interior or exterior node in the Directory Information Tree (DIT). An entry contains information about an instance of one or more objectClasses. These objectClasses have certain required or optional attributes. Attribute types have defined encoding and matching rules that govern such things as the type of data the attribute can hold and how to compare this data during a search. This information model will be covered extensively in the next chapter when we examine LDAP schema.
Naming model
The naming model defines how entries and data in the DIT are uniquely referenced. Each entry has an attribute that is unique among all siblings of a single parent. This unique attribute is called the relative distinguished name (RDN). You can uniquely identify any entry within a directory by following the RDNs of all the entries in the path from the desired node to the root of the tree. This string created by combining RDNs to form a unique name is called the node's distinguished name (DN).
LDAP System Administration Page 2