All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
How to import .SQL Database dump consisting of .txts?
For an assignment at my University, we need to make use of a Patent Database. The problem is that this database is messed up and not ready to use. It has been delivered to us as several .txt files and one .sql file. The txt files are tables of the database and contain data seperated by TABs. The .sql file contains the query to create the appropriate tables with columns. I assume this is a MySQL dump.
Other students tried to open it in Access but it was a big mess with lots of import errors and non existing references.
So, I setup a local MySQL server, ran the SQL query in PHPMyAdmin and am now struggling to import the txt files.
In PHPMyAdmin I selected 'Import' and then CSV and selected the .txt file. That sadly doesn't work, it gives me the following error: Invalid column count in CSV input on line 1.
If I select to import it as 'CSV using LOAD Data' with '\t' as 'Columns terminated by', it does import only 1 row....
I contacted our professor but haven't got a reply from him yet. I assume he is abroad.
Does anyone have an idea how to sucessfully import this database?
Here's the SQL query: http://pastebin.com/1kVECmYU
Here's some sample data from the txt files: http://pastebin.com/fm8SiRzQ
Thank you very much!!
Comments
Why don't you write a small java programm that converts the .txts to .sqls?
Open .txt file in OpenOffice calc, save the file as comma separated (CSV), select , as a separator and import it from phpMyAdmin
Quickest way I can think of would be importing the DB from the .sql, then using php or perl to split up the .txt files.
1 - replace all repeating instances of tab with a single tab or character (such as |)
2 - explode/split each line based on the delimiter
3 - INSERT the data based on the array from the explode/split
Edit: vedran wins, his method is simpler :P
man mysqlimport
There are options depending on how is your text formatted.
Or you can try the other suggestions.
Looks tab delimited to me... That is something that mySQL already handles quite well.
Thanks for the fast replies
@gsrdgrdghd I don't have the knowledge to do so and I think it will take a long time before I get a working program which does that
@Aldryic the .sql file only contains the query which makes empty tables with coloums, so there isn't much to import?
@yomero Text is formatted by TABs
@vedran Using , is a seperator is sadly not an option. The data already contains some , I could use _ though.
@speckl That's correct, it is tab delimited.
Yup, but importing that .sql file creates the structure for your DBs and tables (and is necessary before inserting the data). After that, you only have to parse the .txt files and INSERT the data into the already-existing tables.
Then separate the data with " and use , as field separator http://pastebin.com/VamubCU5
First run the SQL to create the tables:
mysql -u PatentDBUser -pLongLiveLET patent_db < some_file.sql
Or
mysql -u PatentDBUser -pLongLiveLET patent_db mysql> source some_file.sql
Then you could use the LOAD DATA INFILE command
mysql> LOAD DATA INFILE 'somefile.dat' INTO TABLE t1;
Lather, rinse, repeat for the other talbes. I'm assuming you get one file per table.
I think tab-delimited is the default. If not, you can play with the FIELDS TERMINATED BY clause:
http://dev.mysql.com/doc/refman/5.6/en/load-data.html
I sincerely hope I'm not helping a patent troll.
@vedran I downloaded and installed OpenOffice Calc, but the file is too large to be loaded in completely (too many rows).
@raindog308 Thanks for the reply. Using LOAD DATA INFILE I get this error:
1062 - Duplicate entry '3970526-FIRMA HARTUNG KUHN & CO GMBH' for key 'PRIMARY'
Which is strange, as the table is totally empty...
PS: No you aren't helping a patent troll It's just for an assignment
Jeesh this is harder than I thought
You'll get a 1062 error if your key is greater the allowed value of the field. '3970526-FIRMA HARTUNG KUHN & CO GMBH' is not really a valid key, unless your key field was set to some kind of TEXT field. And then it wouldn't really be a very good key.
You should split your field so that the key would be '3970526' and 'FIRMA HARTUNG KUHN & CO GMBH' should go into another field. The concept of key will work properly, then.
http://www.softwareprojects.com/resources/programming/t-how-to-fix-mysql-duplicate-entry-for-key-primary-o-1844.html has more info
The first answer of http://stackoverflow.com/questions/3845014/what-is-mysql-key-efficiency is a good bit of text about why keys are important.
Probably something in your file is messed up
The fields are split. I assume that one row is just messed up. Problem is, this file contains about 4 millions rows. Checking them by hand is going to take a long time.
I just mailed the prof. again. Let's see what he has to say about it.
Others 'claim' to have the database working. In Excel. That's not what I call working. 6 different excel sheets, for each table one; Useless.
just check that line, with notepad++, change, then reimport them.
Too large to import in Notepad++, already tried
Use sed!
Try UltraEdit
use vi and type :line number
like :537537
and you should be able to see it