Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

How can I parse json from a website into python?

$
0
0

I am trying to scrape the news headlines from this page. It appears that the headlines are contained in a json-object named App inside a pair of script tags. If you're reading this in the future, you can assume it looked something like this

    string = '''{"page":{"lang":"en","error":{"state":false,"type":null}},"system":{"referrer":null,"cookie":[],"params":{"get":[],"post":[]}},"components":{"search-fast-links":[{"name":"FY 2022 preliminary financial results","link":"\/en\/investors-and-media\/news\/press-releases\/08-02-2023\/","detail":""},{"name":"Re-domiciliation Q&A","link":"\/en\/investors-and-media\/shareholder-centre\/current-qa\/","detail":""}],"press-release":{"items":[{"name":"Q4 and FY 2023 production results","date":1706648400,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/31-01-2024\/?","theme":["Production results"],"files":[[{"name":"2024_01_31_Q4_Production_results_eng","type":"pdf","size":"402.15 Kb","link":"\/upload\/ib\/1\/24-01-31\/2024_01_31_Q4_Production_results_eng.pdf"}]]},{"name":"Notice regarding a change of a major shareholder","date":1706475600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/29-01-2024\/?","theme":["Regulatory disclosures","Shareholder information"],"files":[[{"name":"2024_01_29_Notice regarding_a_change_of_a_major_shareholder_eng","type":"pdf","size":"279.34 Kb","link":"\/upload\/ib\/1\/24-01-29\/2024_01_29_Notice regarding_a_change_of_a_major_shareholder_eng.pdf"}]]},{"name":"Nominated brokers for the purpose of the Exchange Offer","date":1705525200,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/18-01-2024\/?","theme":["Shareholder information"],"files":[[{"name":"2024_01_18_Nominated_brokers_eng","type":"pdf","size":"202.73 Kb","link":"\/upload\/ib\/1\/24-01-18\/2024_01_18_Nominated_brokers_eng.pdf"}]]},{"name":"Total Voting Rights as at 29 December 2023 ","date":1703797200,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/29-12-2023\/?","theme":["Regulatory disclosures"],"files":[[{"name":"2023_12_29_TVR_eng","type":"pdf","size":"114.49 Kb","link":"\/upload\/ib\/1\/23-12-29\/2023_12_29_TVR_eng.pdf"}]]},{"name":"Receives the most prestigious corporate social responsibility award in the Republic of Kazakhstan","date":1702414800,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/13-12-2023\/?","theme":["ESG","Other"],"files":[[{"name":"2023_12_13_Paryz_award_eng","type":"pdf","size":"200.72 Kb","link":"\/upload\/ib\/1\/23-12-12\/2023_12_13_Paryz_award_eng.pdf"}]]},{"name":"Results of GM","date":1702242000,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/11-12-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_12_11_GM_results_eng","type":"pdf","size":"215.86 Kb","link":"\/upload\/ib\/1\/23-12-10\/2023_12_11_GM_results_eng.pdf"}]]},{"name":"Offer to exchange certain shares currently affected by the EU asset freeze on NSD and Notice of General Meeting","date":1700686800,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/23-11-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_11_23_Exchange_offer_GM_eng","type":"pdf","size":"241.23 Kb","link":"\/upload\/ib\/1\/23-11-23\/2023_11_23_Exchange_offer_GM_eng.pdf"}]]},{"name":"Q3 2023 production results","date":1698699600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/31-10-2023\/?","theme":["Production results"],"files":[[{"name":"2023_10_31_Q3_Production_results_eng","type":"pdf","size":"404.46 Kb","link":"\/upload\/ib\/1\/23-10-31\/2023_10_31_Q3_Production_results_eng.pdf"}]]},{"name":"Results of new share issues ","date":1696971600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/11-10-2023-c\/?","theme":["Other","Regulatory disclosures","Shareholder information"],"files":[[{"name":"2023_10_11_Results_of_new_share_issues_eng","type":"pdf","size":"124.73 Kb","link":"\/upload\/ib\/1\/23-10-11\/2023_10_11_Results_of_new_share_issues_eng.pdf"}]]},{"name":"Completion of Exchange Offer","date":1696971600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/11-10-2023-b\/?","theme":["Other","Shareholder information"],"files":[[{"name":"2023_10_11_Results_of_Exchange_Offer_eng","type":"pdf","size":"215.34 Kb","link":"\/upload\/ib\/1\/23-10-11\/2023_10_11_Results_of_Exchange_Offer_eng.pdf"}]]},{"name":"Director\/PDMR Shareholding","date":1696971600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/11-10-2023-a\/?","theme":["Regulatory disclosures"],"files":[[{"name":"2023_10_11_PDMR_Notification_eng","type":"pdf","size":"131.9 Kb","link":"\/upload\/ib\/1\/23-10-11\/2023_10_11_PDMR_Notification_eng.pdf"}]]},{"name":"Half-year report for the six month ended 30 June 2023","date":1695589200,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/25-09-2023\/?","theme":["Financial results"],"files":[[{"name":"2023_09_25_POLY_1H_2023_Half_yearly_report_eng","type":"pdf","size":"2.07 Mb","link":"\/upload\/ib\/1\/23-09-25\/2023_09_25_POLY_1H_2023_Half_yearly_report_eng.pdf"}]]},{"name":"London De-listing ","date":1693281960,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/29-08-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_08_29_London_De-listing_eng","type":"pdf","size":"121.65 Kb","link":"\/upload\/ib\/1\/23-08-29\/2023_08_29_London_De-listing_eng.pdf"}]]},{"name":"Resumption of trading on AIX","date":1691647200,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/10-08-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_08_10_AIX_trading_resumption_eng","type":"pdf","size":"219.65 Kb","link":"\/upload\/ib\/1\/23-08-10\/2023_08_10_AIX_trading_resumption_eng.pdf"}]]},{"name":"Q2 2023 production results","date":1691528400,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/09-08-2023\/?","theme":["Production results"],"files":[[{"name":"2023_08_09_Q2_Production_eng","type":"pdf","size":"405.7 Kb","link":"\/upload\/ib\/1\/23-08-09\/2023_08_09_Q2_Production_eng.pdf"}]]},{"name":"Re-Domiciliation to AIFC Completed","date":1691442000,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/08-08-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_08_08_Re-domiciliation_eng","type":"pdf","size":"199.45 Kb","link":"\/upload\/ib\/1\/23-08-08\/2023_08_08_Re-domiciliation_eng.pdf"}]]},{"name":"Suspension of Trading on the London Stock Exchange","date":1690862400,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/01-08-2023\/?","theme":["Other","Shareholder information"],"files":[[{"name":"2023_08_01_London_Suspension_eng","type":"pdf","size":"221.29 Kb","link":"\/upload\/ib\/1\/23-08-02\/2023_08_01_London_Suspension_eng.pdf"}]]},{"name":"Results of GM","date":1690516800,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/28-07-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_07_28_GM_results_eng","type":"pdf","size":"214.79 Kb","link":"\/upload\/ib\/1\/23-07-28\/2023_07_28_GM_results_eng.pdf"}]]},{"name":"Results of AGM","date":1690257600,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/25-07-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_07_25_AGM_results_eng","type":"pdf","size":"244.42 Kb","link":"\/upload\/ib\/1\/23-07-25\/2023_07_25_AGM_results_eng.pdf"}]]},{"name":"Update to the timetable of the Re-domiciliation ","date":1689912000,"type":"\u041f\u0440\u0435\u0441\u0441-\u0440\u0435\u043b\u0438\u0437\u044b","link":"\/en\/investors-and-media\/news\/press-releases\/21-07-2023\/?","theme":["Shareholder information"],"files":[[{"name":"2023_07_21_Update_to_Re-domiciliaton_Timetable_eng","type":"pdf","size":"234.32 Kb","link":"\/upload\/ib\/1\/23-07-21\/2023_07_21_Update_to_Re-domiciliaton_Timetable_eng.pdf"}]]}],"nav":{"count":883,"total":45,"current":1},"filters":{"theme":[{"text":"Assets","id":540,"disabled":false,"selected":false},{"text":"Corporate governance","id":532,"disabled":false,"selected":false},{"text":"Dividends","id":531,"disabled":false,"selected":false},{"text":"ESG","id":539,"disabled":false,"selected":false},{"text":"Exploration","id":533,"disabled":false,"selected":false},{"text":"Financial results","id":530,"disabled":false,"selected":false},{"text":"Indexes and ratings","id":537,"disabled":false,"selected":false},{"text":"JV","id":534,"disabled":false,"selected":false},{"text":"Other","id":541,"disabled":false,"selected":false},{"text":"Production results","id":529,"disabled":false,"selected":false},{"text":"Regulatory disclosures","id":536,"disabled":false,"selected":false},{"text":"Reports","id":535,"disabled":false,"selected":false},{"text":"Shareholder information","id":538,"disabled":false,"selected":false}],"years":[{"text":"2024","id":2024,"disabled":false,"selected":false},{"text":"2023","id":2023,"disabled":false,"selected":false},{"text":"2022","id":2022,"disabled":false,"selected":false},{"text":"2021","id":2021,"disabled":false,"selected":false},{"text":"2020","id":2020,"disabled":false,"selected":false},{"text":"2019","id":2019,"disabled":false,"selected":false},{"text":"2018","id":2018,"disabled":false,"selected":false},{"text":"2017","id":2017,"disabled":false,"selected":false},{"text":"2016","id":2016,"disabled":false,"selected":false},{"text":"2015","id":2015,"disabled":false,"selected":false},{"text":"2014","id":2014,"disabled":false,"selected":false},{"text":"2013","id":2013,"disabled":false,"selected":false},{"text":"2012","id":2012,"disabled":false,"selected":false},{"text":"2011","id":2011,"disabled":false,"selected":false},{"text":"2010","id":2010,"disabled":false,"selected":false},{"text":"2009","id":2009,"disabled":false,"selected":false},{"text":"2008","id":2008,"disabled":false,"selected":false},{"text":"2007","id":2007,"disabled":false,"selected":false}]}},"footer":{"documents":[{"link":"\/upload\/ib\/88\/23-06-07\/Polymetal_General_Privacy_Notice_eng.pdf","name":"Privacy notice","fileInfo":"PDF (156.13 Kb)"},{"link":"\/upload\/ib\/62\/23-06-29\/2022_Polymetal_Modern_Slavery_Statement.pdf","name":"Modern Slavery Act Transparency Statement 2022","fileInfo":"PDF (435.06 Kb)"}],"links":[{"name":"Glossary ","link":"\/en\/glossary\/"},{"name":"Sitemap","link":"\/en\/sitemap\/"}],"tune":[{"name":"Contacts","link":"\/en\/contacts\/"},{"name":"Hotline","link":"\/en\/contacts\/hotline\/"}],"danger":"<div class=\"footer__info--text\">\r\n    <span>Please note that <a href=\"https:\/\/www.polymetalinternational.com\/\" class=\"link link--inline\">https:\/\/www.polymetalinternational.com\/<\/a> is the only official URL of&nbsp;Polymetal International plc.  <a href=\"https:\/\/www.polymetal.ru\/\" class=\"link link--inline\">https:\/\/www.polymetal.ru\/<\/a> is related to JSC Polymetal.<\/span>\r\n<\/div>\r\n<div class=\"footer__info--text\">\r\n    <span>Other websites even if&nbsp;they resemble the official ones and\/or contain full or&nbsp;a&nbsp;part of&nbsp;the Company&rsquo;s name in&nbsp;their URL do&nbsp;not relate to&nbsp;Polymetal International plc or&nbsp;its subsidiaries.<\/span>\r\n<\/div>\r\n<div class=\"footer__info--text\">\r\n    <span>Polymetal International plc does not have any official accounts in social media except of <a href=\"https:\/\/www.youtube.com\/channel\/UCddB8YqIjZnak6mlmTcpr3w\" class=\"link link--inline\">Youtube<\/a> and <a href=\"https:\/\/www.linkedin.com\/company\/polymetal\" class=\"link link--inline\">LinkedIn<\/a>. Any statements purportedly provided on behalf of a company is deliberate misrepresentation.<\/span>\r\n<\/div>"}}}'''    import json    json.loads(string)

My question is the following: What's the best way to parse this to python something that python will recognize as json?

  1. I had a look at: "js2py", but I couldn't find anything that did what I want.
  2. I also tried to use string.replace. After replacing all booleans and nonetype with the python equivalents to javascript, I was able to put it through json.load, but I'm concerned with simply replacing every substring of 'false' with 'False', and 'null' with 'None' because the data might change in the future such that either 'false' or 'null' appears in the middle of some other substring that is not a bool, and by replacing it, the contents can get changed in unpredictable ways.
  3. I also had a look at this question, which at first glance looks like the same question, but the answer that was provided was specific to the json data that the OP provided. It would be positive to have an answer independent of the actual content which would work for all json.
  4. I tried to remove App = and ;, and put the json into the variable called string and put it through json.loads. But I'm getting lots of errors:
>>> json.loads(string)Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads    return _default_decoder.decode(s)  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode    obj, end = self.raw_decode(s, idx=_w(s, 0).end())  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode    obj, end = self.scan_once(s, idx)json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2 column 10378 (char 10378)

Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>