ISIS Base: Searching Concepts

(taken from ISIS Base Help)

 

1. Definition of a Local SBF Search in a Molecule Database

 

A local Search By Form (SBF) search in a molecule database search finds molecule records in your local database that contain your text/numeric query exactly as you specify it or your graphical query as a substructure wholly within a larger structure.

 

Note: If your database contains the exact match of your substructure query, you also retrieve the exact match.

 

In addition, you can combine text/numeric and graphical queries. For example, you want to find all molecules that contain a specific steroid substructure wholly within them and that have a melting point that is greater than 220 degrees Centigrade and that contain a compound name that ends in "one" (You can enter as many as 25 queries into different cells of the same table.):

 

 

Your text/numeric query can also contain wildcards. A wildcard is a placeholder within a string of text or numeric values that represents unspecified characters: (The wildcard % is a placeholder for at least one unspecified character.) To find the exact match of the data of interest, enter the search operator LIKE and place your target text in double quotes. A search operator is the command that specifies the type of search.

 

Your graphical query can also contain restrictions (called query features) on atoms and bonds. For example, you can use the ISIS bond-query feature Ch to specify that a bond is part of an acyclic structure in the records retrieved. You can also add hydrogens with visible bonds (called explicit hydrogens) to your query to block substituents on specific atoms.

 

A local search is a search over a database that typically exists on your hard disk. To do a local search on your hard disk, you do not need to input network information.

 

Note: To do a local search that is accessed from a shared network, you need to input network information.

 

1.1. Specifying a Text/Numeric String as a Query

 

A text/numeric string as an SBF query is a phrase in a box or column on the form that specifies the retrieval of data. For example, to find structures with a molecular weight greater than 350, enter the text/numeric string > 350 into the Molecular Weight box on the form.

 

Each phrase in a text/numeric string contains:

 

*          A search operator that specifies the type of search. For example, the search operators = (equal sign) and LIKE specify the retrieval of the exact match of your query. Use = when you want to retrieve data with numeric values. Use LIKE when you want to retrieve data with text. For details on search operators, see Section 38.1.2.

            *          A text or numeric target value that you want to find (such as 350 for the molecular weight or ANTIBIOTIC for the activity).

 

Your text/numeric string is case-sensitive solely when you use a relational database or a UNIX platform.

 

Specifying Numeric String Queries

 

Do not include the units, such as a percent sign (%) or the degrees Centigrade (C). For example, to retrieve all compounds with a molecular weight greater than 350, enter:

 

> 350

 

where the search operator is > and the target value is 350.

 

Specifying Text/Numeric String Queries

 

Enclose queries that contain text (with or without numeric values) in double quotes and always use a wildcard, such as %. For example, to retrieve all compounds from the author Smythe, enter:

 

LIKE "%Smythe%"

 

where the search operator is LIKE and the target value is Smythe.

 

A wildcard is a placeholder within a string of text or numeric values that represents unspecified characters: The wildcard % is a placeholder for at least one unspecified character. The wildcard _ is a placeholder for solely one unspecified character. Without a wildcard, you retrieve solely those records that contain exactly the target value that you specified. For example:

 

Query Without a Wildcard       Example of a Record Retrieved

 

LIKE "antibiotic"          Antibiotic

            ANTIBIOTIC

 

 

Query With the Wildcard %     Examples of Records Retrieved

LIKE "%antibiotic%"   Antibiotic

            ANTIBIOTIC

                                   Antibiotic activity

                                   Resistance to extracellular products such as antibiotics

 

Note: If you want to retrieve data that contains double quotes within it, such as aa"a, you need to enter additional double quotes as follows: LIKE "aa""a" or LIKE"%aa""a%".

 

1.2. The Search Operators

 

<, <=, >, >=, and <>

 

Use <, <=, >, >=, and <> with integers (such as 25 or 100) or real numbers (such as 25.0 or 12.005). With an integer, you retrieve the exact match. With a real number, you must include the exact number of decimal places in your query. If you use zero as a real number, you must enter 0.0 instead of 0. (To use a wildcard with an integer or a real number, you must use the LIKE search operator and enclose the integer or real number in double quotation marks.)

 

Search Operator          Definition         Examples

=          equal (numeric)            = 350

<          less than (numeric)       < 350

<=       less than or equal to (numeric)  = 350

>          greater than (numeric)  > 350

>=       greater than or equal to (numeric)         >= 350

<>       not equal to (text or numeric)   <> 350

                        <> "initiator"

 

LIKE

 

Use LIKE with text enclosed in double quotation marks. Enter your query as follows:

 

Type LIKE. Add a space. Add a double quote. Type your target text with wildcards, such as % or _ (unless the target text is a partial formula). Add the double quote. For example:

 

Query  Examples of Records Retrieved

 

LIKE "methane"           Methane

LIKE "benzoic acid"    Benzoic Acid

 

LIKE "%acid"  Benzoic Acid

 

Acetic Acid

 

LIKE "%acid%"          Benzoic Acid Sodium Salt

 

Acetic Acid Sodium Salt

 

LIKE "%but_ne"          Trans 2-Butene

            2-Butene

            N-Butane

            Butane

 

Butene

 

LIKE "_ethyl"  Methyl

 

Ethyl

LIKE "antibiotic"          Antibiotic

 

LIKE "%antibiotic%"   Antibiotic activity

 

Resistance to extracellular products such as antibiotics

 

Note: To retrieve data that contains a percent sign or an underscore that is not a wildcard, type the percent sign or underscore, preceded by the \ (backslash) symbol. To retrieve data that contains a backslash that is not used to change a wildcard to normal text, enter the backslash twice. To retrieve data that contains double quotes within it, such as aa"a, you need to enter additional double quotes as follows: LIKE "aa""a" or LIKE"%aa""a%".

 

BETWEEN

 

Use BETWEEN with a range of integers, real numbers, and/or text fields. Enter a single space between the two range values. You retrieve all values that fall between (and include) your two values. Enclose all values that include text in double quotation marks. For example:

 

BETWEEN "XR11" "XZ15"

 

BETWEEN 95 100

 

EXISTS

 

Use EXISTS to retrieve all records in the database that contain data in a specific database field. For example:

 

EXISTS

 

Note: You can only use the EXISTS search operator for database fields that do not give you an error message.

 

1.3. Specifying a Molecular Formula as a Query

 

To search the data in Formula box (that is, a box that is connected to a MOLFORMULA field), you can specify the following text/numeric queries:

 

*          An exact formula. Use an exact formula query to find records that contain solely the exact atoms that you specify.

            *          A subformula. Use a subformula query to find records that contain the specific atoms that you specify, and any additional atoms.

 

 

Exact Formula Query

 

To specify an exact formula query, specify the target value for the formula. The target value must include all hydrogens, standard atom symbols, and atom counts. You can use a single space to separate atoms, and capital letters are not required. The equals operator (=) and quotes are optional. For example:

 

Exact Formula Query   Examples of Records Retrieved

c6 h6 O           Phenol

c6h6o 

 

C20 H12 Na2 O4       Sodium salt of phenolphthalein

C20H12Na2O4

= C20 H12 Na2 O4

= C20H12Na2O4

= "C20 H12 Na2 O4"

= "C20H12Na2O4"

 

 

If you enter a single atom within your query you must enter a value of 1 or a space after the symbol. If you do not, ISIS cannot interpret the query correctly. For example:

 

Exact Formula Query   Examples of Records Retrieved

C17 H19 N 03            Piperine

 

C17H19N 03  Piperine

C17H19N103 Piperine

C17H19N03   No molecules retrieved. ISIS interprets "NO3" as 3 Nobelium atoms.

 

Subformula Query

 

To specify a subformula query, use the LIKE operator with a target value for the subformula. The target value can include a range of atom counts, and typically excludes the hydrogens. To exclude specific atoms from the molecules that you retrieve, specify zero for that atom. You can use a single space to separate atoms. Capital letters are not required. Quotes are optional.

 

Subformula Query        Examples of Records Retrieved

LIKE Na         All sodium salts

 

LIKE c12 Br(1-2) S2  All molecules that contain twelve carbon atoms, one to two bromine atoms, and two sulfur atoms

LIKE "C12Br(1-2)S2"            All molecules that contain twelve carbon atoms, one to two bromine atoms, and two sulfur atoms

LIKE c6 h6     All molecules that contain six carbon atoms, six hydrogen atoms, and any number of other atoms

LIKE C5 H10 N0       All molecules that contain five carbon atoms, ten hydrogen atoms, and zero nitrogen atoms.

 

LIKE C5H10N0         All molecules that contain five carbon atoms, ten hydrogen atoms, and zero nitrogen atoms.

 

 

If you enter a single atom within your query you must enter a value of 1 or a space after the symbol. If you do not, ISIS cannot interpret the query correctly. For example:

 

Subformula Query        Examples of Records Retrieved

LIKE C12 Br S2         All molecules that contain twelve carbon atoms, one to two bromine atoms, and two sulfur atoms

LIKE C12Br1S2         All molecules that contain twelve carbon atoms, one to two bromine atoms, and two sulfur atoms

LIKE C12BrS2           No molecules retrieved. ISIS interprets "BrS" as an atom symbol, and gives an error message.

 

1.4. Specifying a Graphical Substructure as a Query

 

A graphical substructure query retrieves records that contain your graphical query embedded wholly within them. For example:

 

 

You can also increase the power of an SBF search by the addition of specific restrictions (called query features) on atoms and/or bonds. There are a number of query features available for use.

 

In the following example, the ISIS atom-query feature (H0) prohibits the attachment of hydrogens on a specific atom in the records retrieved:

 

 

In addition, the hydrogens block substitutions at the specified positions.

 

To use a graphical query, you must have a Structure box or table cell on the form that connects to the database field MOLSTRUCTURE.

 

Notes: (1) You can enter multiple graphical queries onto the form in ISIS/Base. (2) You do not use a search operator with a graphical query.

 

1.5. Specifying Multiple Queries in a Single SBF Search

 

When you combine two or more queries of any type into a single SBF search, the search results that you obtain depend on both the specifications and accuracy of your query and the way that information is stored in your database. For more information on how to create an adequate query, see Section 38.1.1 through Section 38.1.4.

 

Information in a database is stored either in a single level (called a flat database) or in multiple levels (called a hierarchical database). Storage in a hierarchical database is more efficient, but it requires you to enter your queries in the correct location on the form to obtain the search results that you want.

 

For example, the following hierarchical database stores data for multiple decomposition tests that are conducted under different atmospheric conditions:

 

 

(In this database, structures are stored at the top (or root) level, and the percentage of oxygen and the number of peaks are stored at a lower level, under the parent field Decomposition. A parent field is a branch in the hierarchy that contains a set of child fields, such as percent O2 and number of peaks.)

 

You want specific decomposition data from different tests. Your multiple query in a single SBF search specifies that you want all compounds within a single test that contain: Query 1, greater than 65% O2; Query 2, less than three peaks.

 

To obtain search results that match all queries (Query 1 AND Query 2), you can either (a) enter each search query into separate boxes or (b) enter each search query into the same row in a single table. This search gives you all compounds within a single test that meet both

 requirements in your query:

 

 

Caution: For all versions of ISIS before 2.0, multiple queries in boxes might give different results than multiple queries in tables. In the previous example, the SBF query in the box on the form at the left retrieves records in which at least one decomposition test that matches both conditions. The SBF query in the table on the form at the right, however, retrieves additional records in which one decomposition test matches one condition, and another decomposition test matches the other condition. In ISIS 2.0, both queries retrieve only those records in which both conditions are met within the same decomposition test.

 

To obtain search results that match either one query or the other (Query 1 OR Query 2), you must enter each search query into different rows of a single table. You must be certain that the first column in the table is a field at the same database level that you want to search. This search gives you all compounds within a single test that meet either requirement in your query (but not necessarily both requirements):

 

 

When you filter your search results, you see solely those lower-level records that meet your search requirements.

 

Note: A box retrieves data from a single database field. A table retrieves data from multiple database fields. Also, you can enter as many as 25 queries into the same table.

 

2. Definition of Aromaticity

 

For local databases: A bond is aromatic if it is in a six-membered ring with alternating double and single bonds, such as benzene.

 

Note: Aromaticity is not perceived for fused rings in which the double bonds are arranged in a way that cannot be defined as aromatic.

 

3. Specifying Query Features on Atoms and Bonds in Structures

 

A query feature on an atom and/or a bond in a molecule is a restriction that specifies the retrieval of certain types of molecule records from an ISIS/Base database.

 

If you do not use query features, the molecules retrieved contain solely your exact substructure query embedded wholly within them. For example:

 

 

3.1 Allowing or Excluding Specific Atoms

 

Use atom query features to allow or exclude the atoms of your choice.

 

Atom Query Feature: Any Atom (A)

 

Specifies any atom except hydrogen, and requires that there be a substituent at A:

 

            A

 

 

 

 

Notes: To retrieve all molecules that have a chiral flag (the absolute stereoconfigurations), create the following query: A[chiral]. To retrieve all molecules that contain a valence of 14, create the following query: A[XIV].

           

Atom Query Feature: Heteroatoms (Q)

 

Specifies any atom except hydrogen or carbon:

 

            Q

 

 

 

Atom Query Feature: List

 

Specifies any atom on a list of your choice. In the following example, only the atoms C, N, and O are allowed at the position specified within the list brackets:

 

 

Atom Query Feature: Not List

 

Specifies any atom except those on a list of your choice and hydrogen atoms. In the following example, only the atoms C, N, O, and H ARE NOT allowed at the position specified within the list brackets:

 

 

Note: If you specify Not[C], you retrieve both no carbons and no hydrogens.

 

3.2. Prohibiting Hydrogens on an Atom

 

Use an atom query feature to specify the prohibition of the attachment of hydrogens, whether they are implicit (implied but not present) or explicit (present).

 

Atom Query Feature:H0

 

Prohibits the attachment of hydrogens. For example:

 

 

3.3. Allowing a Specific Number of Attachments (Substituents)

 

Use explicit hydrogens (those hydrogens that are present on the structure, and not implied) to block non-hydrogen attachments on atoms or use atom query features to specify the number of non-hydrogen attachments on atoms. A non-hydrogen attachment is also called a substituent.

 

Explicit Hydrogens

 

To block non-hydrogen attachments on atoms, draw hydrogens explicitly. For example:

 

 

3.4. Allowing a Specific Bond Type

 

Use bond query features to specify a specific bond type.

 

Bond Query Feature: Any Bond

 

Specifies any bond type, single, double, triple, or aromatic, at that position:

 

           

 

 

 

Bond Query Feature: Aromatic Bond

 

Specifies solely an aromatic bond at that position:

 

           

 

 

Bond Query Feature: Single/Double Bond

 

Specifies either a single or a double bond at that position:

 

           

 

 

 

3.5. Allowing Solely Chain Bonds

 

Use a bond query feature to specify that the bond is part of a chain.

 

Bond Query Feature: Ch

 

Specifies the bond is part of an acyclic structure. For example:

 

 

3.6. Allowing Solely Ring Bonds

 

Use a bond query feature to specify that the bond is part of a ring.

 

Bond Query Feature: Rn

 

Specifies that the bond is part of a cyclic structure. For example:

 

 

(The dashed lines indicate the ISIS bond query S/A, which specifies that the retrieved molecules contain either single or aromatic bonds at the specified positions.)

 

4. Specifying Stereochemistry

 

For local databases: You can specify stereochemistry in the following ways:

 

Up and Down Bonds

 

An Up bond () or a Down bond () on the asymmetric tetrahedral center of atoms C, N, O, P, S, or Si allows you to retrieve molecules with matching stereochemistry. For example:

 

 

An SBF search produces the following results:

 

 

An Either bond () that is attached to any of these atoms allows you to retrieve records from the database that contain Up and Down bonds (unspecified) at the specified position.

 

Stereo Boxes on Cis or Trans Configurations

 

Stereo boxes on asymmetric double bonds that are attached to atoms C, N, O, P, S, or Si allow you to retrieve molecules with matching cis or trans stereochemistry. (Without the stereo boxes, you retrieve both the cis and trans configuration.) For example:

 

The cis configuration is not retrieved:

 

 

An SBF search produces the following results:

 

 

 

5. Guidelines for Revising the Query

 

If your SBF search retrieved no records or too few records, you might want to:

 

*          Conduct an SBF search on a different set of records with the same query.

            *          Check that your search domain is set to the entire database.

            *          Check that you did not search at too low a level in the database hierarchy. For more information, see your ISIS Administrator.

            *          Make the structural constraints in your query more general, and then search the same set of records. For example, you can remove any explicit bonds or atoms that are not required, change bond types to increase bond variation, use the atom query features List, Q, or A to increase atom variation, or remove stereochemical requirements.

 

*          Make the text/numeric strings in your query more general, and then search the same set of records. For example, you can apply operators that partially restrict, such as > or between instead of =, or you can apply wildcards. (A wildcard is a placeholder within a string of text or numeric values that represents unspecified characters.)

 

If your SBF search retrieved too many records, you might want to:

 

*          Refine the structural constraints in your query, and then search the same set of records. For example, you can limit substitution at specific positions with explicit hydrogens, change bond types to force the retrieval of solely rings or solely chains, increase the number of functional groups, restrict atom or bond variation with explicit atoms or bonds, or add an additional substructure to the query so that it is a multiple-fragment query.

            *          Make the text/numeric strings in your query more restrictive, and then search the same set of records. For example, you can use the operator = instead of >.

 

*          Save a list of the records retrieved, refine your query, and then search the saved list.

            *          Save a list of the records retrieved, conduct an additional search for compounds that you want to exclude (and save them), and then subtract the two lists.

            *          First conduct a formula search to screen out large and complex molecules that are stored in your database, and then conduct a search on the less complex molecules. If your query is not able to map a stored molecule within a specific length of time, the molecule is added to the search results.

 

*          Filter your search results so that you view solely the lower-level records. (This assumes that you are viewing both upper- and lower-level records.)

 

6. Searching Example 1: Combining Text/Numeric Queries

 

This searching example shows you how to conduct a single SBF search with a query that contains two text/numeric strings.

 

6.1. Problem

 

You want to retrieve all molecules that meet both of the following requirements:

 

*          Have a molecular weight greater than 250

*          Contain two or three oxygen atoms

 

6.2. Approach

 

Enter the numeric string query > 250 into the Molecular Weight box or table cell on the form in ISIS/Base that connects to the database field MOLECULAR.WEIGHT. This string contains the search operator > and the target value 250.

 

Enter the text/numeric string query LIKE "O(2-3)" into the Formula box or table cell on the form that connects to the database field MOL>MOLFORMULA. This string contains the search operator LIKE and the target value o(2-3) in double quotation marks, where o is either in upper- or lower-case characters.

 

Note: If you enter both queries into different rows in a single table, you retrieve only those molecules that meet one requirement (but not necessarily both).

 

Conduct an SBF search in the local molecule database of your choice, such as the LOCALMX demo database.

 

6.3. Query

 

 

Note: Your form in ISIS/Base might be entirely different from the one in this example.

 

6.4. Search

 

To execute this search, follow the step-by-step instructions. Alternatively, choose Help > Search, and search for SBF, how to/examples.

 

6.5. Results

 

You retrieve records with molecules that have a molecular weight greater than 250 and records with molecules that contain two or three oxygens.

 

7. Searching Example 2: Combining Text/Numeric and Graphical Queries

 

This searching example shows you how to conduct a single SBF search with a query that contains a text/numeric string and a graphical query.

 

7.1. Problem

 

You want to retrieve all molecules that meet both of the following requirements:

 

*          Contain a steroid substructure embedded wholly within them

*          Have a melting point that is greater than 220 degrees Centigrade

 

7.2. Approach

 

Enter the numeric string query > 220 into the Temperature box or table cell on the form in ISIS/Base that connects to the database field TEMPERATURE (or to an equivalent field with another name). This string contains the search operator > and the target value 220.

 

Draw the following steroid as your graphical query:

 

 

Enter the steroid query into the Structure box or table cell on the form that connects to the database field MOLSTRUCTURE.

 

Conduct an SBF search in the local molecule database of your choice, such as the LOCALMX demo database.

 

7.3. Query

 

 

Note: Your form in ISIS/Base might be entirely different from the one in this example.

 

7.4. Search

 

To execute this search, follow the step-by-step instructions. Alternatively, choose Help > Search, and search for SBF, how to/examples.

 

7.5. Results

 

You retrieve records with molecules that have a melting point greater than 220 and all records that contain the steroid substructure in your query.