Input

From any input

Analyzes the input and calls the appropriate input module.

tabledataextractor.input.from_any.create_table(name_key, table_number=1)[source]

Checks the input and calls the appropriate modules for conversion. Returns a numpy array with the raw table.

Parameters:
  • name_key (str | list) – Path to .html or .cvs file, URL or python list that is used as input
  • table_number (int) – Number of the table that we want to input if there are several at the given address/path
Returns:

table as numpy.array

tabledataextractor.input.from_any.csv(name)[source]

Returns True if input is csv file.

Parameters:name (str) – Input string
tabledataextractor.input.from_any.html(name)[source]

Returns True if input is html file.

Parameters:name (str) – Input string
tabledataextractor.input.from_any.url(name)[source]

Returns True if input is URL. Uses django.core.validators.URLValidator.

Parameters:name (str) – Input string

From .csv file

Reads a csv formatted table from file. The file has to be ‘utf-8’ encoded.

tabledataextractor.input.from_csv.read(file_path)[source]
Parameters:file_path (str) – Path to .csv input file
Returns:numpy.ndarray

From .html file

Reads an html formatted table.

tabledataextractor.input.from_html.configure_selenium(browser='Firefox')[source]

Configuration for Selenium. Sets the path to geckodriver.exe

Parameters:browser (str) – Which browser to use
Returns:Selenium driver
tabledataextractor.input.from_html.makearray(html_table)[source]

Creates a numpy array from an .html file, taking rowspan and colspan into account.

Modified from:
John Ricco, https://johnricco.github.io/2017/04/04/python-html/, Using Python to scrape HTML tables with merged cells

Added functionality for duplicating cell content for cells with rowspan/colspan. The table has to be \(n*m\), rectangular, with the same number of columns in every row.

tabledataextractor.input.from_html.read_file(file_path, table_number=1)[source]

Reads an .html file and returns a numpy array.

tabledataextractor.input.from_html.read_url(url, table_number=1)[source]

Reads in a table from an URL and returns a numpy array. Will try Requests first. If it doesn’t succeed, Selenium will be used.

Parameters:
  • url (str) – Url of the page where the table is located
  • table_number (int) – Number of Table on the web page.

From Python List

Inputs from python list object.

tabledataextractor.input.from_list.read(plist)[source]

Creates a numpy array from a Python list. Works if rows are of different length.

Parameters:plist (list) – Input List
Returns:numpy.ndarray