Input¶

From any input¶

Analyzes the input and calls the appropriate input module.

tabledataextractor.input.from_any.create_table(name_key, table_number=1)[source]¶

Checks the input and calls the appropriate modules for conversion. Returns a numpy array with the raw table.

Parameters:	name_key (str \| list) – Path to .html or .cvs file, URL or python list that is used as input table_number (int) – Number of the table that we want to input if there are several at the given address/path
Returns:	table as numpy.array

tabledataextractor.input.from_any.csv(name)[source]¶

Returns True if input is csv file.

Parameters:	name (str) – Input string

tabledataextractor.input.from_any.html(name)[source]¶

Returns True if input is html file.

Parameters:	name (str) – Input string

tabledataextractor.input.from_any.url(name)[source]¶

Returns True if input is URL. Uses django.core.validators.URLValidator.

Parameters:	name (str) – Input string

From .csv file¶

Reads a csv formatted table from file. The file has to be ‘utf-8’ encoded.

tabledataextractor.input.from_csv.read(file_path)[source]¶

Parameters:	file_path (str) – Path to .csv input file
Returns:	numpy.ndarray

From .html file¶

Reads an html formatted table.

tabledataextractor.input.from_html.configure_selenium(browser='Firefox')[source]¶

Configuration for Selenium. Sets the path to geckodriver.exe

Parameters:	browser (str) – Which browser to use
Returns:	Selenium driver

tabledataextractor.input.from_html.makearray(html_table)[source]¶

Creates a numpy array from an .html file, taking rowspan and colspan into account.

Modified from:: John Ricco, https://johnricco.github.io/2017/04/04/python-html/, Using Python to scrape HTML tables with merged cells

Added functionality for duplicating cell content for cells with rowspan/colspan. The table has to be \(n*m\), rectangular, with the same number of columns in every row.

tabledataextractor.input.from_html.read_file(file_path, table_number=1)[source]¶: Reads an .html file and returns a numpy array.

tabledataextractor.input.from_html.read_url(url, table_number=1)[source]¶

Reads in a table from an URL and returns a numpy array. Will try Requests first. If it doesn’t succeed, Selenium will be used.

Parameters:	url (str) – Url of the page where the table is located table_number (int) – Number of Table on the web page.

From Python List¶

Inputs from python list object.

tabledataextractor.input.from_list.read(plist)[source]¶

Creates a numpy array from a Python list. Works if rows are of different length.

Parameters:	plist (list) – Input List
Returns:	numpy.ndarray