ProcessDocumentJobRequest
Properties
Name |
Type |
Description |
Notes |
---|---|---|---|
document_id |
str |
String id of the document to create a summary from. |
|
summary_id |
str |
The requested identifier of the output document summary. |
[optional] |
system_prompt |
str |
System prompt. |
[optional] |
pre_prompt_summary |
str |
Prompt that goes before each large piece of text to summarize. |
[optional] |
prompt_summary |
str |
Prompt that goes after each large piece of text to summarize. |
[optional] |
image_batch_image_prompt |
str |
Prompt for each image batch for vision models. |
[optional] |
image_batch_final_prompt |
str |
Prompt to reduce all answers each image batch for vision models. |
[optional] |
llm |
str |
LLM to use. |
[optional] |
llm_args |
Dict[str, object] |
A map of arguments sent to LLM with query. * `temperature` (type=double, default=0.0) - A value used to modulate the next token probabilities. 0 is the most deterministic and 1 is most creative. * `top_k` (type=integer, default=1) - A number of highest probability vocabulary tokens to keep for top-k-filtering. * `top_p` (type=double, default=0.0) - If set to a value < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. * `seed` (type=integer, default=0) - A seed for the random number generator when sampling during generation (if temp>0 or top_k>1 or top_p<1), seed=0 picks a random seed. * `repetition_penalty` (type=double, default=1.07) - A parameter for repetition penalty. 1.0 means no penalty. * `max_new_tokens` (type=double, default=1024) - A maximum number of new tokens to generate. This limit applies to each (map+reduce) step during summarization and each (map) step during extraction. * `min_max_new_tokens` (type=integer, default=512) - A minimum value for max_new_tokens when auto-adjusting for content of prompt, docs, etc. * `response_format` (type=enum[text, json_object, json_code], default=text) - An output type of LLM * `guided_json` (type=map) - If specified, the output will follow the JSON schema. * `guided_regex` (type=string) - If specified, the output will follow the regex pattern. Only for models that support guided generation. * `guided_choice` (type=array[string]) - If specified, the output will be exactly one of the choices. Only for models that support guided generation. * `guided_grammar` (type=string) - If specified, the output will follow the context free grammar. Only for models that support guided generation. * `guided_whitespace_pattern` (type=string) - If specified, will override the default whitespace pattern for guided json decoding. Only for models that support guided generation. * `enable_vision` (type=enum[on, off, auto], default=auto) - Controls vision mode, send images to the LLM in addition to text chunks. * `visible_vision_models` (type=array[string], default=[auto]) - Controls which vision model to use when processing images. Must provide exactly one model. [auto] for automatic. * `images_num_max` (type=integer, default=None) - Maximum number of images to process. * `json_preserve_system_prompt` (type=boolean, default=None) - Whether to preserve system prompt in JSON response. * `client_metadata` (type=string, default=None) - Additional metadata to send with the request. * `min_chars_per_yield` (type=integer, default=1) - Minimum characters to yield in streaming response. * `cost_controls` (type=map) A map with cost controls settings: * `max_cost` (type=double) - Sets the maximum allowed cost in USD per LLM call when doing Automatic model routing. If the estimated cost based on input and output token counts is higher than this limit, the request will fail as early as possible. * `max_cost_per_million_tokens` (type=double) - Only consider models that cost less than this value in USD per million tokens when doing automatic routing. Using the max of input and output cost. * `model` (type=array[string]) - Optional subset of models to consider when doing automatic routing. If not specified, all models are considered. * `willingness_to_pay` (type=double) - Controls the willingness to pay extra for a more accurate model for every LLM call when doing automatic routing, in units of USD per +10% increase in accuracy. We start with the least accurate model. For each more accurate model, we accept it if the increase in estimated cost divided by the increase in estimated accuracy is no more than this value divided by 10%, up to the upper limit specified above. Lower values will try to keep the cost as low as possible, higher values will approach the cost limit to increase accuracy. 0 means unlimited. * `willingness_to_wait` (type=double) - Controls the willingness to wait longer for a more accurate model for every LLM call when doing automatic routing, in units of seconds per +10% increase in accuracy. We start with the least accurate model. For each more accurate model, we accept it if the increase in estimated time divided by the increase in estimated accuracy is no more than this value divided by 10%. Lower values will try to keep the time as low as possible, higher values will take longer to increase accuracy. 0 means unlimited. * `use_agent` (type=boolean, default=False) - If True, use the AI agent (with access to tools) to generate the response. * `agent_accuracy` (type=string, default="standard") - Effort level by the agent. Only if use_agent=True. One of ["quick", "basic", "standard", "maximum"]. * `agent_max_turns` (type=union[string, integer], default="auto") - Optional max. number of back-and-forth turns with the agent. Only if use_agent=True. Either "auto" or an integer. * `agent_tools` (type=union[string, array[string]], default="auto") - Either "auto", "all", "any" to enable all available tools, or a specific list of tools to use. Only if use_agent=True. * `agent_type` (type=string, default="auto") - Type of agent to use for task processing. * `agent_original_files` (type=array[string], default=None) - List of file paths for agent to process. * `agent_timeout` (type=integer, default=None) - Timeout in seconds for each agent turn. * `agent_total_timeout` (type=integer, default=3600) - Total timeout in seconds for all agent processing. * `agent_code_writer_system_message` (type=string, default=None) - System message for agent code writer. * `agent_num_executable_code_blocks_limit` (type=integer, default=1) - Maximum number of executable code blocks. * `agent_system_site_packages` (type=boolean, default=True) - Whether agent has access to system site packages. * `agent_main_model` (type=string, default=None) - Main model to use for agent. * `agent_max_stream_length` (type=integer, default=None) - Maximum stream length for agent response. * `agent_max_memory_usage` (type=integer, default=16*10243)** - Maximum memory usage for agent in bytes (16GB default). * `agent_main_reasoning_effort` (type=integer, default=None) - Effort level for main reasoning. * `agent_advanced_reasoning_effort` (type=integer, default=None) - Effort level for advanced reasoning. * `agent_max_confidence_level` (type=integer, default=None) - Maximum confidence level for agent responses. * `agent_planning_forced_mode` (type=boolean, default=None) - Whether to force planning mode for agent. * `agent_too_soon_forced_mode` (type=boolean, default=None) - Whether to force "too soon" mode for agent. * `agent_critique_forced_mode` (type=integer, default=None) - Whether to force critique mode for agent. * `agent_stream_files` (type=boolean, default=True) - Whether to stream files from agent. |
[optional] |
max_num_chunks |
int |
Max limit of chunks to send to the summarizer. |
[optional] |
sampling_strategy |
str |
How to sample if the document has more chunks than max_num_chunks. Options are "auto", "uniform", "first", "first+last", default is "auto" (a hybrid of them all). |
[optional] [default to ‘auto’] |
pages |
List[int] |
List of specific pages (of the ingested document in PDF form) to use from the document. 1-based indexing. |
[optional] |
var_schema |
object |
Optional JSON schema to use for guided json generation. |
[optional] |
keep_intermediate_results |
bool |
Whether to keep intermediate results. If false, further LLM calls are applied to the intermediate results until one global summary is obtained - map+reduce (i.e., summary). If true, the results’ content will be a list of strings (the results of applying the LLM to different pieces of document context) - map (i.e., extract). |
[optional] [default to False] |
guardrails_settings |
[optional] |
||
timeout |
int |
Amount of time in seconds to allow the request to run. The default is 86400 seconds. |
[optional] [default to 86400] |
meta_data_to_include |
Dict[str, bool] |
A map with flags that indicate whether each piece of document metadata is to be included as part of the context for a chat with a collection. * `name` (type: boolean, default=True) * `text` (type: boolean, default=True) * `page` (type: boolean, default=True) * `captions` (type: boolean, default=True) * `uri` (type: boolean, default=False) * `connector` (type: boolean, default=False) * `original_mtime` (type: boolean, default=False) * `age` (type: boolean, default=False) * `score` (type: boolean, default=False) |
[optional] |
Example
from h2ogpte.rest_sync.models.process_document_job_request import ProcessDocumentJobRequest
# TODO update the JSON string below
json = "{}"
# create an instance of ProcessDocumentJobRequest from a JSON string
process_document_job_request_instance = ProcessDocumentJobRequest.from_json(json)
# print the JSON string representation of the object
print(ProcessDocumentJobRequest.to_json())
# convert the object into a dict
process_document_job_request_dict = process_document_job_request_instance.to_dict()
# create an instance of ProcessDocumentJobRequest from a dict
process_document_job_request_from_dict = ProcessDocumentJobRequest.from_dict(process_document_job_request_dict)