Web Gateway and UTM have a central categorisation system that it used to control access to various types of content. When the system is deciding whether to block content (e.g. when filtering web traffic, or the Mail Server and Filter module blocking inappropriate mails), they first attempt to categorise the content by using the criteria defined for each category.
The system has a selection of predefined categories and we provide regular updates to the categorisation criteria. You can also create new categories as you see fit, and you can modify the categorisation criteria for both the predefined and user defined categories yourself by right clicking the category. When modifying the predefined categories, any criteria that you add yourself will always override the criteria that we manage - you, rather than us, are ultimately in control of your network!
There are several different types of categorisation criteria that you can edit:
- URIs - the Web Proxy and Filter module will examine the web address (URI) that the user is accessing. The URI patterns that are listed in each category can determine the categories to which it belongs. These criteria are not used by the Mail Server and Filter module.
- HTTP response headers - these are sent by the web server when a web site is accessed. The Web Proxy and Filter module examines the HTTP response headers for each request. These criteria are not used by the Mail Server and Filter module.
- Content types - this is a description of what kind of data is being filtered - it could signify that it is a text document, or a spreadsheet, for example. The Mail Server and Filter and Web Proxy and Filter modules can determine what type of data they are scanning.
- Keywords - The Mail Server and Filter and Web Proxy and Filter modules analyse content looking for words, sentences and complex expressions.
The final determination of which categories the content belongs to is made by a combination of many individual pieces identified by all of the criteria that makes up a category, with each piece contributes a "score" - the higher the score, the more likely it is that the content belongs to that category.
If you want to categorise a website, you can simply add its address to the appropriate categories by right clicking the category and then Edit URIs. You can now click Add URI and define a pattern to match the address. The top of the dialogue box shows a description of what the pattern you have entered will match. You can make the pattern as specific or as non-specific as necessary.
Similarly, if a web site is being consistently miscategorised, you can add its address to that category and tick the Exclude from category box.
Please note that the system does not use "*" as a wildcard - to make partial matches on a URI, please use the drop-down menus provided.
HTTP Response Headers
When an object is fetched from a web server, the server includes a number of headers in the response, which provide some information about the object. These headers can be scanned for certain information and used to help categorisation. Right click the category and then Edit HTTP response headers. You can click Add header to define a pattern to match a particular header - specify the HTTP header name, and a keyword to look for in that header. You can specify whether to treat the keyword as the start of a word, end of a word, whole word, or tell it to ignore word boundaries and find that text anywhere. For advanced use, it is possible to specify the keyword as a Perl compatible regular expression. Finally, you can specify how much a matching header will contribute to the score - the higher the score, the more likely it is that the content belongs to that category.
If you want to block a certain type of file, you can tell the system that all files of that type belong to a certain category. Right click the category and then Edit content types. You can now click Add content type and specify the content type to match in the standard MIME content type format ("type/subtype"). If the subtype box is left blank, the rule matches all subtypes - i.e. "image/" will match all images.
You can specify keywords to look for in textual content. This is a powerful feature, but one to be used with care since it is very easy to miscategorise content. A keyword can be a single word (or even part of a single word), a sentence, or a more complex expression. Right click the category and then Edit keywords. You can now click Add keyword and specify the keyword to match. You can specify whether to treat the keyword as the start of a word, end of a word, whole word, or tell it to ignore word boundaries and find that text anywhere. For advanced use, it is possible to specify the keyword as a Perl compatible regular expression. Finally, you can specify how much the keyword will contribute to the score if it is found - the higher the score, the more likely it is that the content belongs to that category. As a rule of thumb, single words should not be assigned a high score as the probability of a false positive is quite high; longer sentences can usually more safely be given a higher score.
Fixing False Positives
As with all heuristics, whilst we endeavour to make the categorisation system as reliable as possible, it is never 100% accurate and content will sometimes be miscategorised.
If the miscategorisation is an isolated case of the Web Proxy and Filter consistently miscategorising a web site, then some simple steps can be taken to prevent it: edit the offending category's URIs and add the web site's address with the Exclude from category box ticked. This will ensure that the web site will never be considered to be in that category. If the web site is completely trusted (e,g, your own intranet site) then consider adding an override to the Web Proxy and Filter module to completely disable filtering.
If a lot of content is being miscategorised, you might consider lowering the sensitivity of the offending categories. Also, it is a good idea to check any categorisation criteria that you have added and remove any that is likely to make the categorisation system over-sensitive.