Input¶
From any input¶
Analyzes the input and calls the appropriate input module.
-
tabledataextractor.input.from_any.
create_table
(name_key, table_number=1)[source]¶ Checks the input and calls the appropriate modules for conversion. Returns a numpy array with the raw table.
Parameters: - name_key (str | list) – Path to .html or .cvs file, URL or python list that is used as input
- table_number (int) – Number of the table that we want to input if there are several at the given address/path
Returns: table as numpy.array
-
tabledataextractor.input.from_any.
csv
(name)[source]¶ Returns True if input is csv file.
Parameters: name (str) – Input string
From .csv file¶
Reads a csv formatted table from file. The file has to be ‘utf-8’ encoded.
From .html file¶
Reads an html formatted table.
-
tabledataextractor.input.from_html.
configure_selenium
(browser='Firefox')[source]¶ Configuration for Selenium. Sets the path to
geckodriver.exe
Parameters: browser (str) – Which browser to use Returns: Selenium driver
-
tabledataextractor.input.from_html.
makearray
(html_table)[source]¶ Creates a numpy array from an .html file, taking rowspan and colspan into account.
- Modified from:
- John Ricco, https://johnricco.github.io/2017/04/04/python-html/, Using Python to scrape HTML tables with merged cells
Added functionality for duplicating cell content for cells with rowspan/colspan. The table has to be \(n*m\), rectangular, with the same number of columns in every row.
-
tabledataextractor.input.from_html.
read_file
(file_path, table_number=1)[source]¶ Reads an .html file and returns a numpy array.