Rewriting an HTML Table into SQL INSERT Code
Let us assume you have defined an SQL table and want to produce code to insert records from an HTML table. (Each td
element might contain HTML entities besides text.) You do this by applying a chain of transformations. First we shall look at all our issues or concerns here:
- HTML entities: we might want to undo them
- commas: perhaps we might remove a trailing comma such as
(..., 'TEXT',)
or even(..., 'TEXT'),;
- quoting fields
- SQL values consisting of HTML: you want to keep their properties etc.
- specifically, tables inside
td
(orth
) elements
Should we use XSLT? XSLT can handle some of these challeges beautifully.
A possible sequence of transformations:
- Remove all properties inside top tags
- Escape all apostrophes:
'
->\'
- Undo all entities for readability (optional)
- Remove trailing commas inside records (
(..., 'TEXT',)
) - Remove trailing comma at end of
INSERT
command ((..., 'TEXT',),;
) - Transform
<tr>
opening tags into opening parantheses ((
) - Transform
</tr>
closing tags into closing parantheses (),
) - Transform
<td>
opening tags into opening parantheses ((
) - Transform
</td>
closing tags into closing parantheses ()
) - Add semicolon (
;
) after everything else - Remove comma before closing semicolon (
,;
into just;
)