You are here: Home Plone Basics Searching the Catalog
Search
Advanced Search…
Statistics
Total: 407
Total Pages: 249
Total Folders: 82
Total Files: 14
Total Links: 26
Last modification: 26.07.2010 15:45
 

Searching the Catalog

by Wyden Silvan last modified 11.01.2010 21:40

Returns all objects of type ATLink in the folder banner:

return context.banner.objectValues('ATLink')


Of course, the biggest question is how to search the catalog and use the results. The first of these tasks depends upon the indexes, so I cover each of the indexes and show how to search them. The second of these tasks involves manipulating the results, so I then show you how to do this.

All of the following examples are in Python because this is the best way to search a catalog. I also show a quick example of how to hook this into a page template. I fully recommend using Python for manipulating the catalog because it really is the best place to do things, allowing you the best flexibility without having to worry about the syntax.

In general, you achieve searching by calling the method searchResults on the portal_catalog object and passing through a series of keyword parameters. A couple of reserved keywords exist, but the rest are mapped directly to the indexes of the same name. So if you wanted to search the SearchableText index, you'd pass through to the search method a keyword parameter for SearchableText. The reserved keywords are as follows:

  • sort_on: This is the index to sort the results on, assuming that the index allows sorting (full-text indexes don't allow sorting).  sort_on = 'sortable_title' or 'id
  • sort_order: This is reverse or descending; if not specified, the default is ascending.
  • sort_limit: This is an optimization hint to make sorting a little quicker.
  • sort_on="getObjPositionInParent" : to sort on the position in the folder

So, a general search for all items that mention Plone and are published in Date order looks something like this:

context.portal_catalog.searchResults(
    review_state = "published",
    SearchableText = "Plone",
    sort_on="Date"
)

The search will return the intersection of the index results, so this will find all items that mention Plone and are published. You can't do searches that are the union of results; however, you could do multiple results and then add the results together, but this is a rather unusual case, though.

TIP If you do a search with no values, then the entire contents of the catalog are returned. By default, all searches add values for effective and end dates, ensuring that you see content only between these times, unless the user calling the search has the Access inactive portal content permission.

Searching a Field or Date Index

To search a FieldIndex, pass through the value of the field. Any hits that match will be returned; for example, to search for all the image's in a site, use the following:

results = context.portal_catalog.searchResults(
    Type = "Image"
)

A field index can take a range of objects as well, and the index will attempt to find all the values in-between by performing a comparison of the values. This range could be between two dates, two numbers, or two stringsit really depends upon the value of FieldIndex. You do this by passing a dictionary to the index, rather than just a string. The dictionary should contain two values: a list called query, which contains the values to be tested, and a range, which defines a range of the values. The range is a string of one of the following:

  • min: Anything larger than the smallest item
  • max: Anything smaller than the largest item
  • minmax: Anything smaller than the largest and bigger than the smallest

For example, to find all events that have a start time bigger than now (in other words, anything in the future), use the following:

from DateTime import DateTime
now = DateTime()
results = context.portal_catalog.searchResults(
       Type = "Event"
       end = { "query": [now,],
                "range": "min" }
)

To search on a range, such as all news items in December, you'd need to calculate the start and end dates for the month. From those dates, you can then construct the following query:

from DateTime import DateTime
start = DateTime('2004/12/01')
end = DateTime('2004/12/31')
results = context.portal_catalog.searchResults(
        Type = "News Item",
        created = { "query": [start, end],
                         "range": "minmax" }
)

Date indexes work in the same manner as field indexes, and often you'll see dates placed inside field indexes, which works just fine.

Searching a KeywordIndex

By default, a KeywordIndex returns all the values that match in the keyword index. Subject is the only KeywordIndex; this is the keyword that a user has assigned to an object through the Properties tab of the Plone interface. To search for all items with the keyword Africa, use this:

results = context.portal_catalog.searchResults(
        Subject = "Africa"
)

Similar to a FieldIndex, a KeywordIndex can be passed a more complicated query, with several objects and an and/or operator (or is the default). This would allow you to find all objects that have almost any combination of keywords. To find all objects that have the subject Africa and sun, use the following:

results = context.portal_catalog.searchResults(
        Subject = { "query": ["Africa", "sun"],
                     "operator": "and" }
)

Searching a PathIndex

A path index allows you to search for all objects in a certain path. It will return every object below a current location, so if you ask for all objects in Members, it'll return everything in everybody's home directories. For example, for all objects that have Members in their path, use this:

results = context.portal_catalog.searchResults(
        path = "/Plone/Members"
)

If you want to further restrict this, you can do so by passing through a level parameter that sets where you expect the value to be. The level is a number representing its position in the path, from the left when splitting it up by forward slashes. For example, in the previous code, Plone is level 0, Members is level 1, and so on. Similarly to KeywordIndex, you can pass through an and/or operator. To get all objects in the /Plone/Members/danae folder and the /Plone/testing/danae folder, use the following:

results = context.portal_catalog.searchResults(
        path = { "query": ["danae"],
                "level" : 2 }
)

Searching a ZCText Index

ZCTextIndex is the most complicated of all indexes and takes a whole host of options. Each ZCTextIndex requires a lexicon; fortunately, Plone creates and configures all this out of the box. If you click portal_catalog, select the Contents tab, and click plone_lexicon, you can see the default configuration of the lexicon. Clicking the Query tab will show you all the words that are in the lexicon built out of your Plone site content.

The ZCTextIndex is searched using the format I described in Chapter 3. It takes similar terms to the searching that you can use on Google or other search engines. At its most basic, you can search for any term (note that this is case insensitive), like so:

results = context.portal_catalog.searchResults(
        SearchableText = "space"
)

But you can also search for all of the following:

  • Globbing: Use an asterisk to signify any letters. For example, tues* matches tuesday and tuesdays. You can't use the asterisk at the beginning of a word, though.
  • Single wildcards: Use a question mark to signify one letter. For example, ro?e matches rope, rote, role, and so on. You can't use the question mark at the beginning of a word.
  • And: Using and signifies that both terms on other side of it must exist. For example, rome and tuesday will return only a result with both those words are in the content.
  • Or: Using or signifies that either terms can exist. For example, rome or tuesday will return a result if either of those words are in the content.
  • Not: Using not returns results where this isn't present (a prefix and is required). For example, welcome and not page would return matches for pages that contained welcome, but not page.
  • Phrases: You can group phrases with double quotes (") and signify several words one after the other. For example: "welcome page" matches This welcome page is used to introduce you to the Plone Content Management System, but not Welcome to the front page of.
  • Not phrase: You can specify a phrase with a minus (-) prefix. For example, welcome -"welcome page" matches all pages with welcome in them, but not ones that match the phrase welcome page.

TIP If you perform a search with no text, then no results are returned.

Using the Results

So you've got some results, now what do you do with them? The first thing a lot of people do is look at the results and assume that it's a list of the objects that were cataloged. Well, it isn't; rather, it's a series of 'catalog brains. These brains are actually lazy objects that contain the metadata columns defined earlier. You can access any of these columns as if it were an attribute. For example, to print all the IDs of result objects, use the following:

results = context.portal_catalog.searchResults()
for result in results:
    print result.getId
return printed

In this example, getId is the name of a metadata column, so it'll display the value for getId that the catalog had for that object. If you try to just access a value that doesn't exist as a metadata column, then you'll get an AttributeError. The following are a few methods available from a brain that are useful:

  • getPath: This returns the physical path for this object inside Zope.
  • getURL: This returns the URL for this object with virtual hosting applied.
  • getObject: This returns the actual object.
  • getRID: This is a unique ID for the object in the catalog, and it changes each time the object is uncataloged. It's for internal purposes only.

So, if you wanted to get the object for each result, you can do so, as you'll see in the following example. However, there's a reason the catalog doesn't do thisit's expensive (in terms of computation) because it involves waking up an object from the database (and all the objects in-between) and making lots of security checks. If you can try to make your metadata contain the right information, you'll have a much faster application. Obviously, sometimes metadata can't contain everything, but it's worth considering in the design. To get each object, use the following:

results = context.portal_catalog.searchResults()
for result in results:
    object = result.getObject()
    print object
return printed

Since you have a Python list of these brains, it's now straightforward to manipulate the results in a manner that you see fit. To find out how many results were returned, you can just call the len function on the list, like so:

results = context.portal_catalog.searchResults()
print "Number of results", len(results)
return printed

NOTE: len is a Python function that tells you the length of an item.

To get just the first ten items, use a Python slice, like so:

results = context.portal_catalog.searchResults()
return results[:10]

To do further filtering, you could manually filter the whole list, like so:

results = context.portal_catalog.searchResults()
for result in results[:10]:
    # Title returns a string so we can use the find method of
    # a string to look for occurence of a word
    if result.Title.find("Plone") > -1:
        print result.Title
return printed

To get a random object from the catalog, use the random module, like so:

import random
results = context.portal_catalog.searchResults()
r = random.choice(results)
object = r.getObject()
return object

Tying It All Together: Making a Search Form

In the previous discussion, I showed you how to get some results out of the catalog, and I used Script (Python) objects to demonstrate that. But you're probably asking yourself, how can I do this from a page template?

I'll start at the other end and first assume you have the results from a catalog query and loop through them in a page template using tal:repeat. This is how a lot of portlets are put togetherthe published and events portlets both just do queries and then show the results. Those portlets embed the query in a page template either by calling it directly:

<div tal:define="results python: here.portal_catalog.searchResults(Type="Event")">

or by calling a separate Script (Python) object that returns the results. For example, in the following, the script is called getCatalogResults:

##parameters=
kw = {}
# enter your query into the kw dictionary
return context.portal_catalog(**kw)

In a page template, you'd get the results in the following manner:

<div tal:define="results here/getCatalogResults">

After doing this, you need to loop through the results using the standard tal:repeat syntax. You can access each metadata column directly in the Template Attribute Language (TAL) by making a path expression to the column. So, given a brain, you could get the title from the metadata by calling result/Title. Listing 11-3 shows an example page that loops through the contents of getCatalogResults and displays each item in a simple unordered list.

Listing 11-3. Looping Through getCatalogResults

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US"
      lang="en-US"
      metal:use-macro="here/main_template/macros/master"
      i18n:domain="plone">
<body>
<div metal:fill-slot="main">
<ul tal:define="results here/getCatalogResults">
    <li tal:repeat="result results">
        <a href=""
           tal:attributes="href result/getURL"
           tal:content="result/Title" />
        <span tal:replace="result/Description" />
    </li>
</ul>
</div>
</body>
</html>

One property of the searchResults method is that if you don't pass any parameters to the function, it'll look them up from the incoming request. So if you wanted to allow a form to post parameters to your results, then all you have to do is change the previous results line to the following:

<ul tal:define="
 
  results python: here.portal_catalog.searchResults(REQUEST=request)
  ">

Now you can redo your query and append any index to the URL. For example, if you called this page template testResults and appended ?Type=Document to the end of the URL of your browser, only the documents in your site would appear. Since you can pass in almost any request values, you can set up a search form that would pass this information through to the search form. This is what the search and advanced search pages do; you'll note that if you go to a Plone site and search for beer in the search box, your URL will now have ?SearchableText=beer.

So, Listing 11-4 shows a form to call your page template.

Listing 11-4. A Form to Call Your Template

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US"
      lang="en-US"
      metal:use-macro="here/main_template/macros/master"
      i18n:domain="plone">
<body>
<div metal:fill-slot="main">
  <p>Select a content type to search for</p>
  <form method="post" action="testResults">
    <select name="Type">
      <option
       tal:repeat="value python:here.portal_catalog.uniqueValuesFor('Type')"
       tal:content="value" />
        </select>
        <br />
        <input type="submit" class="context">
    </form>
</div>
</body>
</html>

This script uses a method called uniqueValuesFor on the catalog, which will return all the unique values that exist for an index. This lets you perform a task such as populating a little drop-down box in a form, which is a pretty useful thing to have.

At this point, it becomes an exercise in HTML and page templates to make the pages as complicated as you'd like. Of course, the best place to look for all this is in the actual templates of Plone, which give lines upon lines of great examples. All the portlets you're familiar with in Plone (such as the calendar, events, related, and so on) are all built using catalog queries to determine what to show.

In this chapter, I've provided you with an overview of ways to develop a Plone site and how content types work in your site. I demonstrated how a content type is constructed and then referenced through the catalog. This is a key development methodology in Plone.

In the next chapter, I'll show how to develop a new content type pretty much from scratch. You'll see how you can integrate that new content type with the catalog register in the portal_types tool.