API Documentation¶
RCSB PDB Search API
-
class
rcsbsearchapi.Attr(attribute: str, type: Optional[Union[List[str], str]], description: Optional[Union[List[str], str]] = None)¶ A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”
Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.
Previously, __bool__ was overloaded to run the exists function, but __bool__ can’t be overloaded to return non-boolean value. Method overloading bool was deleted.
Rather than their normal bool return values, operators return Terminals.
Pre-instantiated attributes are available from the
rcsbsearchapi.rcsb_attributesobject. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.The range dictionary requires the following keys:
“from” -> int
“to” -> int
“include_lower” -> bool
“include_upper” -> bool
-
contains_phrase(value: Union[str, Value[str]]) → AttributeQuery¶ Match an exact phrase
-
contains_words(value: Union[str, Value[str], List[str], Value[List[str]]]) → AttributeQuery¶ Match any word within the string.
Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.
-
equals(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute == value
-
exact_match(value: Union[str, Value[str]]) → AttributeQuery¶ Exact match with the value
-
exists() → rcsbsearchapi.search.AttributeQuery¶ Attribute is defined for the structure
-
greater(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute > value
-
greater_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute >= value
-
in_(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Value[List[str]], Value[List[int]], Value[List[float]], Value[List[date]], Value[Tuple[str, …]], Value[Tuple[int, …]], Value[Tuple[float, …]], Value[Tuple[date, …]]]) → AttributeQuery¶ Attribute is contained in the list of values
-
less(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute < value
-
less_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute <= value
-
range(value: Dict[str, Any]) → AttributeQuery¶ Attribute is within the specified half-open range
- Parameters
value – lower and upper bounds [a, b)
-
type: Optional[Union[List[str], str]]¶ search service type. text for structure attributes, text_chem for chemical attributes
-
class
rcsbsearchapi.AttributeQuery(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)¶ Special case of a Terminal for Structure and Chemical Attribute Searches
AttributeQueries compares some attribute of a structure to a value.
Examples
>>> AttributeQuery("exptl.method", "exact_match", "X-RAY DIFFRACTION") >>> AttributeQuery("rcsb_entry_container_identifiers.entry_id", operator="in", value=["4HHB", "2GS2"])
A full list of attributes is available in the schema. Operators are documented here.
The
Attrclass provides a more pythonic way of constructing AttributeQueries.-
__init__(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)¶ Search for the string value given possible attribute or operator Also can specify service and negation
- Parameters
attribute (Optional[str], optional) – specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None.
operator (Optional[str], optional) – specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.
value (Optional[TValue], optional) – value to compare attribute to. Defaults to None.
service (Optional[str], optional) – specify what search service (i.e “text”, “text_chem”). Defaults to None.
negation (Optional[bool], optional) – logical not. Defaults to False.
-
-
class
rcsbsearchapi.Group(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearchapi.search.Query] = ())¶ AND and OR combinations of queries
-
to_dict()¶ Get dictionary representing this query
-
-
class
rcsbsearchapi.Query¶ Base class for all types of queries.
Queries can be combined using set operators:
q1 & q2: Intersection (AND)
q1 | q2: Union (OR)
~q1: Negation (NOT)
q1 - q2: Difference (implemented as q1 & ~q2)
q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))
Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.
Queries can be executed by calling them as functions (list(query())) or using the exec function.
Queries are immutable, and all modifying functions return new instances.
-
and_(other: Query) → Query¶ -
and_(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an AND
-
assign_ids() → Query¶ Assign node_ids sequentially for all terminal nodes
- Returns
the modified query, with node_ids assigned sequentially from 0
-
exec(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None) → Union[Session, int]¶ Evaluate this query and return an iterator of all result IDs
-
or_(other: Query) → Query¶ -
or_(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an OR
-
abstract
to_dict() → Dict¶ Get dictionary representing this query
-
to_json() → str¶ Get JSON string of this query
-
class
rcsbsearchapi.Terminal(service: Union[List, str], params: Dict[str, Any], node_id: int = 0)¶ A terminal query node.
Used for doing various types of searches. Accepts a service type and a dictionary of parameters. The set of parameters differs for different search services.
Terminal can be built by passing in a service and parameter dictionary, but it’s tedious work. Typically, it’s built by child classes that each represent a unique type of search. This allows for more concise searching.
Examples
>>> Terminal("full_text", {"value": "protease"}) >>> Terminal("text", {"attribute": "rcsb_id", "operator": "in", "negation": False, "value": ["5T89, "1TIM"]})
-
to_dict()¶ Get dictionary representing this query
-
-
class
rcsbsearchapi.TextQuery(value: str)¶ Special case of a Terminal for free-text queries
-
__init__(value: str)¶ Search for the string value anywhere in the text
- Parameters
value – free-text query
-
Interact with the [RCSB PDB Search API](https://search.rcsb.org/#search-api).
-
class
rcsbsearchapi.search.Attr(attribute: str, type: Optional[Union[List[str], str]], description: Optional[Union[List[str], str]] = None)¶ A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”
Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.
Previously, __bool__ was overloaded to run the exists function, but __bool__ can’t be overloaded to return non-boolean value. Method overloading bool was deleted.
Rather than their normal bool return values, operators return Terminals.
Pre-instantiated attributes are available from the
rcsbsearchapi.rcsb_attributesobject. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.The range dictionary requires the following keys:
“from” -> int
“to” -> int
“include_lower” -> bool
“include_upper” -> bool
-
contains_phrase(value: Union[str, Value[str]]) → AttributeQuery¶ Match an exact phrase
-
contains_words(value: Union[str, Value[str], List[str], Value[List[str]]]) → AttributeQuery¶ Match any word within the string.
Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.
-
equals(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute == value
-
exact_match(value: Union[str, Value[str]]) → AttributeQuery¶ Exact match with the value
-
exists() → rcsbsearchapi.search.AttributeQuery¶ Attribute is defined for the structure
-
greater(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute > value
-
greater_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute >= value
-
in_(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Value[List[str]], Value[List[int]], Value[List[float]], Value[List[date]], Value[Tuple[str, …]], Value[Tuple[int, …]], Value[Tuple[float, …]], Value[Tuple[date, …]]]) → AttributeQuery¶ Attribute is contained in the list of values
-
less(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute < value
-
less_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]]) → AttributeQuery¶ Attribute <= value
-
range(value: Dict[str, Any]) → AttributeQuery¶ Attribute is within the specified half-open range
- Parameters
value – lower and upper bounds [a, b)
-
type: Optional[Union[List[str], str]]¶ search service type. text for structure attributes, text_chem for chemical attributes
-
class
rcsbsearchapi.search.AttributeQuery(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)¶ Special case of a Terminal for Structure and Chemical Attribute Searches
AttributeQueries compares some attribute of a structure to a value.
Examples
>>> AttributeQuery("exptl.method", "exact_match", "X-RAY DIFFRACTION") >>> AttributeQuery("rcsb_entry_container_identifiers.entry_id", operator="in", value=["4HHB", "2GS2"])
A full list of attributes is available in the schema. Operators are documented here.
The
Attrclass provides a more pythonic way of constructing AttributeQueries.-
__init__(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)¶ Search for the string value given possible attribute or operator Also can specify service and negation
- Parameters
attribute (Optional[str], optional) – specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None.
operator (Optional[str], optional) – specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.
value (Optional[TValue], optional) – value to compare attribute to. Defaults to None.
service (Optional[str], optional) – specify what search service (i.e “text”, “text_chem”). Defaults to None.
negation (Optional[bool], optional) – logical not. Defaults to False.
-
-
class
rcsbsearchapi.search.ChemSimilarityQuery(value: Optional[str] = None, query_type: typing_extensions.Literal[formula, descriptor] = 'formula', descriptor_type: Optional[typing_extensions.Literal[InChI, SMILES]] = None, match_subset: Optional[bool] = False, match_type: Optional[typing_extensions.Literal[graph - relaxed - stereo, graph - relaxed, fingerprint - similarity, sub - struct - graph - relaxed - stereo, sub - struct - graph - relaxed, graph - exact]] = None)¶ Special case of Terminal for chemical similarity search queries
-
__init__(value: Optional[str] = None, query_type: typing_extensions.Literal[formula, descriptor] = 'formula', descriptor_type: Optional[typing_extensions.Literal[InChI, SMILES]] = None, match_subset: Optional[bool] = False, match_type: Optional[typing_extensions.Literal[graph - relaxed - stereo, graph - relaxed, fingerprint - similarity, sub - struct - graph - relaxed - stereo, sub - struct - graph - relaxed, graph - exact]] = None)¶ - Parameters
value (Optional[str], optional) – chemical formula or descriptor (SMILES or InChI). Defaults to None.
query_type (ChemSimType, optional) – “formula” or “descriptor”. Defaults to “formula”.
descriptor_type (Optional[SubsetDescriptorType], optional) – if “descriptor”, whether it’s “SMILES” or “InCHI”. Defaults to None.
match_subset (Optional[bool], optional) – if “formula”, return chemical components/structures that contain the formula as a subset. Defaults to False.
match_type (Optional[ChemSimMatchType], optional) – if “descriptor”, type of matches to find and return (see below). Defaults to None.
Guide for “match_type” options: +———————————–+——————————————-+ | match_type | | +===================================+===========================================+ | “graph-relaxed” | Similar Ligands (including Stereoisomers) | | “graph-relaxed-stereo” | Similar Ligands (Stereospecific) | | “fingerprint-similarity” | Similar Ligands (Quick screen) | | “sub-struct-graph-relaxed-stereo” | Substructure (Stereospecific) | | “sub-struct-graph-relaxed” | Substructure (including Stereoisomers) | | “graph-exact” | Exact match | +———————————–+——————————————-+
-
-
class
rcsbsearchapi.search.Facet(name: str, aggregation_type: typing_extensions.Literal[terms, histogram, date_histogram, range, date_range, cardinality], attribute: str, interval: Optional[Union[int, str]] = None, ranges: Optional[List[rcsbsearchapi.search.Range]] = None, min_interval_population: Optional[int] = None, max_num_intervals: Optional[int] = None, precision_threshold: Optional[int] = None, nested_facets: Optional[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet, List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]]] = None)¶ Facet object for use in a faceted query.
- Attributes:
name (str): Specifies the name of the aggregation. aggregation_type (AggregationType): Specifies the type of the aggregation. Can be “terms”, “histogram”, “date_histogram”, “range”, “date_range”, or “cardinality”. attribute (str): Specifies the full attribute name to aggregate on. interval (Optional[Union[int, str]], optional): Size of the intervals into which a given set of values is divided. Required only for use with
“histogram” and “date_histogram” aggregation types (defaults to None if not included).
- ranges (Optional[List[Range]], optional): A set of ranges, each representing a bucket. Note that this aggregation includes the ‘from’ value and
excludes the ‘to’ value for each range. Should be a list of Range objects (leave the “include_lower” and “include_upper” fields empty). Required only for use with “range” and “date_range” aggregation types (defaults to None if not included).
- min_interval_population (Optional[int], optional): Minimum number of items (>= 0) in the bin required for the bin to be returned. Only for use with
“terms”, “histogram”, and “date_histogram” facets (defaults to 1 for these aggregation types, otherwise defaults to None).
- max_num_intervals (Optional[int], optional): Maximum number of intervals (<= 65336) to return for a given facet. Only for use with “terms”
aggregation type (defaults to 65336 for this aggregation type, otherwise defaults to None).
- precision_threshold (Optional[int], optional): Allows to trade memory for accuracy, and defines a unique count (<= 40000) below which counts are
expected to be close to accurate. Only for use with “cardinality” aggregation type (defaults to 40000 for this aggregation type, otherwise defaults to None).
- nested_facets (Optional[Union[Facet, FilterFacet, List[Union[Facet, FilterFacet]]]], optional): Enables multi-dimensional aggregations.
Should contain a List of Facets or FilterFacets. Can be used with any aggregation type. Defaults to None.
-
to_dict() → dict¶ Get dictionary representing request option, skips values of None
-
class
rcsbsearchapi.search.FilterFacet(filter: Union[rcsbsearchapi.search.TerminalFilter, rcsbsearchapi.search.GroupFilter], facets: Union[rcsbsearchapi.search.Facet, FilterFacet, List[Union[rcsbsearchapi.search.Facet, FilterFacet]]])¶ Filter results that contribute to bucket count
-
filter¶ filter to apply to facets
- Type
Union[TerminalFilter, GroupFilter]
-
-
class
rcsbsearchapi.search.Group(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearchapi.search.Query] = ())¶ AND and OR combinations of queries
-
to_dict()¶ Get dictionary representing this query
-
-
class
rcsbsearchapi.search.GroupBy(aggregation_method: str, similarity_cutoff: Optional[int] = None, ranking_criteria_type: Optional[rcsbsearchapi.search.RankingCriteriaType] = None)¶ return results as groups
-
aggregation_method¶ “matching_deposit_group_id”, “sequence_identity”, “matching_uniprot_accession”.
- Type
str
-
similarity_cutoff¶ only for aggregation method “sequence identity”, identity threshold for grouping. 100, 95, 90,70, 50, or 30. Defaults to None.
- Type
int, optional
-
ranking_criteria_type¶ control ordering of results. Defaults to None.
- Type
Optional[RankingCriteriaType], optional
-
to_dict() → Dict¶ Get dictionary representing request option, skips values of None
-
-
class
rcsbsearchapi.search.GroupFilter(logical_operator: typing_extensions.Literal[and, or], nodes: List[Union[TerminalFilter, GroupFilter]])¶ Group filter class for use with FilterFacet queries
-
logical operator “and”, “or” logical operator
- Type
TAndOr
-
nodes¶ list of filters to combine
- Type
List[Union[“TerminalFilter”, “GroupFilter”]]
-
to_dict()¶ Get dictionary representing request option, skips values of None
-
-
class
rcsbsearchapi.search.PartialQuery(query: rcsbsearchapi.search.Query, operator: typing_extensions.Literal[and, or], attr: rcsbsearchapi.search.Attr)¶ A PartialQuery extends a growing query with an Attr. It is constructed using the fluent syntax with the and_ and or_ methods. It is not usually necessary to create instances of this class directly.
PartialQuery instances behave like Attr instances in most situations.
-
__init__(query: rcsbsearchapi.search.Query, operator: typing_extensions.Literal[and, or], attr: rcsbsearchapi.search.Attr)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
rcsbsearchapi.search.Query¶ Base class for all types of queries.
Queries can be combined using set operators:
q1 & q2: Intersection (AND)
q1 | q2: Union (OR)
~q1: Negation (NOT)
q1 - q2: Difference (implemented as q1 & ~q2)
q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))
Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.
Queries can be executed by calling them as functions (list(query())) or using the exec function.
Queries are immutable, and all modifying functions return new instances.
-
and_(other: Query) → Query¶ -
and_(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an AND
-
assign_ids() → Query¶ Assign node_ids sequentially for all terminal nodes
- Returns
the modified query, with node_ids assigned sequentially from 0
-
exec(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None) → Union[Session, int]¶ Evaluate this query and return an iterator of all result IDs
-
or_(other: Query) → Query¶ -
or_(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an OR
-
abstract
to_dict() → Dict¶ Get dictionary representing this query
-
to_json() → str¶ Get JSON string of this query
-
class
rcsbsearchapi.search.Range(start: Optional[Union[str, float]] = None, end: Optional[Union[str, float]] = None, include_lower: Optional[bool] = None, include_upper: Optional[bool] = None)¶ Primarily for use with “range” and “date_range” aggregations with the Facet class. include_upper and include_lower should not be used with Facet queries.
Either start or end are required to construct .. attribute:: start
- type
Optional[Union[str, float]]
-
end¶ - Type
Optional[Union[str, float]]
-
include_lower¶ whether to include start value in range
- Type
Optional[bool]
-
include_upper¶ whether to include end value in range
- Type
Optional[bool]
-
class
rcsbsearchapi.search.RankingCriteriaType(sort_by: str, filter: Optional[Union[rcsbsearchapi.search.GroupFilter, rcsbsearchapi.search.TerminalFilter]] = None, direction: Optional[typing_extensions.Literal[asc, desc]] = None)¶ Request option controlling the order that results are returned
-
sort_by¶ “score”, “size”, “count”, or full attribute name
- Type
str
-
filter¶ filter out results
- Type
Optional[Union[GroupFilter, TerminalFilter]], optional
-
direction¶ The order in which to sort. Undefined defaults to “desc”.
- Type
Optional[Literal[“asc”, “desc”]]
-
-
class
rcsbsearchapi.search.RequestOption¶ Base class for request options Note: return_all_hits, paginate not implemented. They are handled automatically by package.
-
abstract
to_dict() → Dict¶ Get dictionary representing request option, skips values of None
-
abstract
-
class
rcsbsearchapi.search.SeqMotifQuery(value: str, pattern_type: Optional[typing_extensions.Literal[simple, prosite, regex]] = 'simple', sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')¶ Special case of a terminal for protein, DNA, or RNA sequence motif queries
-
__init__(value: str, pattern_type: Optional[typing_extensions.Literal[simple, prosite, regex]] = 'simple', sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')¶ - Parameters
value (str) – motif to search
pattern_type (Optional[SeqMode], optional) – motif syntax (“simple”, “prosite”, “regex”). Defaults to “simple”.
sequence_type (Optional[SequenceType], optional) – type of biological sequence (“protein”, “dna”, “rna”). Defaults to “protein”.
-
-
class
rcsbsearchapi.search.SequenceQuery(value: str, evalue_cutoff: Optional[float] = 0.1, identity_cutoff: Optional[float] = 0, sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')¶ Special case of a terminal for protein, DNA, or RNA sequence queries
-
__init__(value: str, evalue_cutoff: Optional[float] = 0.1, identity_cutoff: Optional[float] = 0, sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')¶ The string value is a target sequence that is searched
- Parameters
value (str) – protein or nucleotide sequence
evalue_cutoff (Optional[float], optional) – upper cutoff for E-value (lower is more significant). Defaults to 0.1.
identity_cutoff (Optional[float], optional) – lower cutoff for percent sequence match (0-1). Defaults to 0.
sequence_type (Optional[SequenceType], optional) – type of biological sequence (“protein”, “dna”, “rna”). Defaults to “protein”.
-
-
class
rcsbsearchapi.search.Session(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)¶ A single query session.
Handles paging the query and parsing results
-
__init__(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)¶ Initialize self. See help(type(self)) for accurate signature.
-
iquery(limit: Optional[int] = None) → List[str]¶ Evaluate the query and display an interactive progress bar.
Requires tqdm.
-
static
make_uuid() → str¶ Create a new UUID to identify a query
-
rcsb_query_builder_url() → str¶ URL to view this query on the RCSB PDB website query builder
-
rcsb_query_editor_url() → str¶ URL to edit this query in the RCSB PDB query editor
-
to_dict() → Dict¶ return full json response
-
-
class
rcsbsearchapi.search.Sort(sort_by: str, direction: Optional[str] = None, filter: Optional[Union[rcsbsearchapi.search.GroupFilter, rcsbsearchapi.search.TerminalFilter]] = None)¶ control sorting of results
-
sort_by¶ “score” to sort by relevancy scores or full attribute name
- Type
str
-
filter¶ filter for results. Defaults to None.
- Type
Optional[GroupFilter, TerminalFilter], optional
-
direction¶ “asc” (ascending) or “desc” (descending). Defaults to None.
- Type
str, optional
-
to_dict() → Dict¶ Get dictionary representing request option, skips values of None
-
-
class
rcsbsearchapi.search.StructMotifQuery(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', backbone_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, side_chain_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, angle_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, entry_id: Optional[str] = None, url: Optional[str] = None, file_path: Optional[str] = None, file_extension: Optional[str] = None, residue_ids: Optional[list] = None, rmsd_cutoff: int = 2, atom_pairing_scheme: typing_extensions.Literal[ALL, BACKBONE, SIDE_CHAIN, PSEUDO_ATOMS] = 'SIDE_CHAIN', motif_pruning_strategy: typing_extensions.Literal[NONE, KRUSKAL] = 'KRUSKAL', allowed_structures: Optional[list] = None, excluded_structures: Optional[list] = None, limit: Optional[int] = None)¶ Special case of a terminal for structure motif queries.
If you provide an entry_id, the other optional parameters can be ignored. If you provide a file_url, you must also provide a file_extension. If you provide a filepath, you must also provide a file_extension.
As is standard with Structure Motif Queries, you must include a list of residues.
Positional arguments STRONGLY discouraged.
-
__init__(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', backbone_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, side_chain_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, angle_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, entry_id: Optional[str] = None, url: Optional[str] = None, file_path: Optional[str] = None, file_extension: Optional[str] = None, residue_ids: Optional[list] = None, rmsd_cutoff: int = 2, atom_pairing_scheme: typing_extensions.Literal[ALL, BACKBONE, SIDE_CHAIN, PSEUDO_ATOMS] = 'SIDE_CHAIN', motif_pruning_strategy: typing_extensions.Literal[NONE, KRUSKAL] = 'KRUSKAL', allowed_structures: Optional[list] = None, excluded_structures: Optional[list] = None, limit: Optional[int] = None)¶ - Parameters
structure_search_type (StructEntryType, optional) – how to find given structure (“entry_id”, “url”, “file_path”). Defaults to “entry_id”.
backbone_distance_tolerance (StructMotifTolerance, optional) – tolerance for distance between Cα atoms (in Å). Defaults to 1.
side_chain_distance_tolerance (StructMotifTolerance, optional) – tolerance for distance between Cβ atoms (in Å). Defaults to 1.
angle_tolerance (StructMotifTolerance, optional) – angle between CαCβ vectors (in multiples of 20 degrees). Defaults to 1.
entry_id (Optional[str], optional) – if “entry_id” specified, PDB ID or CSM ID . Defaults to None.
url (Optional[str], optional) – if “file_url” specified, url to file. Defaults to None.
file_path (Optional[str], optional) – if “file_path” specified, path to file. Defaults to None.
file_extension (Optional[str], optional) – if “file_url” specified, type of file linked to (ex: “cif”). Defaults to None.
residue_ids (Optional[list], optional) – list of StructureMotifResidue objects . Defaults to None.
rmsd_cutoff (int, optional) – upper cutoff for root-mean-square deviation (RMSD) score. Defaults to 2.
atom_pairing_scheme (StructMotifAtomPairing, optional) – Which atoms to consider to compute RMSD scores and transformations. Defaults to “SIDE_CHAIN”.
motif_pruning_strategy (StructMotifPruning, optional) – specifies how query motifs are pruned (i.e. simplified). Defaults to “KRUSKAL”.
allowed_structures (Optional[list], optional) – list of allowed residues specified by strings (ex: [“HIS”, “LYS”]). Defaults to None.
excluded_structures (Optional[list], optional) – if the list of structure identifiers is specified, the search will exclude those structures from the search space. Defaults to None.
limit (Optional[int], optional) – stop after accepting this many hits. Defaults to None.
-
-
class
rcsbsearchapi.search.StructSimilarityQuery(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', entry_id: Optional[str] = None, file_url: Optional[str] = None, file_path: Optional[str] = None, structure_input_type: Optional[typing_extensions.Literal[assembly_id, chain_id]] = 'assembly_id', assembly_id: Optional[str] = '1', chain_id: Optional[str] = None, operator: typing_extensions.Literal[strict_shape_match, relaxed_shape_match] = 'strict_shape_match', target_search_space: typing_extensions.Literal[polymer_entity_instance, assembly] = 'assembly', file_format: Optional[str] = None)¶ Special case of a terminal for structure similarity queries
-
__init__(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', entry_id: Optional[str] = None, file_url: Optional[str] = None, file_path: Optional[str] = None, structure_input_type: Optional[typing_extensions.Literal[assembly_id, chain_id]] = 'assembly_id', assembly_id: Optional[str] = '1', chain_id: Optional[str] = None, operator: typing_extensions.Literal[strict_shape_match, relaxed_shape_match] = 'strict_shape_match', target_search_space: typing_extensions.Literal[polymer_entity_instance, assembly] = 'assembly', file_format: Optional[str] = None)¶ - Parameters
structure_search_type (StructEntryType, optional) – how to find given structure (“entry_id”, “file_url”, “file_path”). Defaults to “entry_id”.
entry_id (Optional[str], optional) – if “entry_id” specified, PDB ID or CSM ID. Defaults to None.
file_url (Optional[str], optional) – if “file_url” specified, url to file . Defaults to None.
file_path (Optional[str], optional) – if “file_path” specified, path to file. Defaults to None.
structure_input_type (Optional[StructSimInputType], optional) – type of the given structure . Defaults to “assembly_id”.
assembly_id (Optional[str], optional) – if input_type is “assembly_id”, the assembly id number. Defaults to “1”.
chain_id (Optional[str], optional) – if input_type is “chain_id”, the chain id letter. Defaults to None.
operator (StructSimOperator, optional) – search mode (“strict_shape_match” or “relaxed_shape_match”). Defaults to “strict_shape_match”.
target_search_space (StructSimSearchSpace, optional) – target objects against which the query will be compared for shape similarity. Defaults to “assembly”.
file_format (Optional[str], optional) – if “file_url” specified, type of file linked to (ex: “cif”). Defaults to None.
-
-
class
rcsbsearchapi.search.StructureMotifResidue(chain_id: Optional[str] = None, struct_oper_id: Optional[str] = None, label_seq_id: Optional[str] = None, exchanges: Optional[list] = None)¶ This class is for defining residues. For use with the Structure Motif Search.
-
__init__(chain_id: Optional[str] = None, struct_oper_id: Optional[str] = None, label_seq_id: Optional[str] = None, exchanges: Optional[list] = None)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
rcsbsearchapi.search.Terminal(service: Union[List, str], params: Dict[str, Any], node_id: int = 0)¶ A terminal query node.
Used for doing various types of searches. Accepts a service type and a dictionary of parameters. The set of parameters differs for different search services.
Terminal can be built by passing in a service and parameter dictionary, but it’s tedious work. Typically, it’s built by child classes that each represent a unique type of search. This allows for more concise searching.
Examples
>>> Terminal("full_text", {"value": "protease"}) >>> Terminal("text", {"attribute": "rcsb_id", "operator": "in", "negation": False, "value": ["5T89, "1TIM"]})
-
to_dict()¶ Get dictionary representing this query
-
-
class
rcsbsearchapi.search.TerminalFilter(attribute: str, operator: typing_extensions.Literal[equals, greater, greater_or_equal, less, less_or_equal, range, exact_match, in, exists], value: Optional[Union[str, int, float, bool, rcsbsearchapi.search.Range, List[str], List[int], List[float]]] = None, negation: bool = False, case_sensitive: bool = False)¶ A filter based on a single Terminal node. Can be combined into GroupFilters
- Attribute:
attribute (str): specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None. operator (Literal[“equals”, “greater”, “greater_or_equal”, “less”, “less_or_equal”, “range”, “exact_match”, “in”, “exists”]):
specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.
- value (Optional[Union[str, int, float, bool, Range, List[str], List[int], List[float]]], optional):
The search term(s). Can be a single or multiple words, numbers, dates, date math expressions, or ranges.
negation (bool, optional): logical not. Defaults to False. case_sensitive (bool, optional): whether to do case sensitive matching of value. Defaults to False.
-
to_dict()¶ Get dictionary representing request option, skips values of None
-
class
rcsbsearchapi.search.TextQuery(value: str)¶ Special case of a Terminal for free-text queries
-
__init__(value: str)¶ Search for the string value anywhere in the text
- Parameters
value – free-text query
-
-
class
rcsbsearchapi.search.Value(value: T)¶ Represents a value in a query.
In most cases values are unnecessary and can be replaced directly by the python value.
Values can also be used if the Attr object appears on the right:
Value(“4HHB”) == Attr(“rcsb_entry_container_identifiers.entry_id”)
-
rcsbsearchapi.search.fileUpload(filepath: str, fmt: str = 'cif') → str¶ Take a file given by a filepath, and return the corresponding URL to use in a structure search. This URL should then be passed through as part of the value parameter, along with the format of the file.